Stoelzel Software Technologie SST
         
         
Information X
OK
 
Please note, this application is still a prototype !
The HTMLStripper is currently still in a very early stage of development. Version 0.3 is a preliminary and very rudimentary implementation, not a fully developed and tested product.
Although it is stable, many of its features are not fully functional, don't function correctly, or have yet to be implemented.
Nonetheless, to a certain, very limited, extent, it can already be used.
       
HTMLStripper App Icon   HTMLStripper Version 0.3

Preliminary User Guide
       
Click to expand or collapse Topic Hierarchy  
Click to expand or collapse Related Topics  
  This documentation is preliminary in nature. It applies to the prototype, version 0.3 of the application currently named HTMLStripper, the user interface and functionality of which are likely to be subject to numerous changes in the prototype versions leading up to the first, fully implemented, version 1.0.
It therefore assumes that the user is, at least, familiar with the basics, of using Microsoft Windows, computer applications. Furthermore, as the guide does not provide information on how to integrate the generated data into HTML/XML or Microsoft Compiled Help (.CHM) projects, the user should be proficient in the use of various other applications, to make full use of the described features.
Nonetheless, the guide does cover the most relevant aspects of using the application. Specifically these are
Basic Procedure(s)
Acquiring and providing the source file(s)
Converting the source file into an ANSI or 16-bit, Unicode encoded file, if necessary
Opening the ANSI or Unicode encoded source file in the SST HTMLStripper application
Deciding on and creating a list of words to suppress/ignore
Stripping the source file of its tags
Selecting and/or editing the words and/or phrases to add to the keywords list
Adding words, phrases, etc, from the word list(s) and stripped contents to the keywords list
Exporting the keywords as a keywords meta tag or Microsoft compiled help, .hhk, keywords, index file
Saving the created and/or processed files
Acquiring and Providing the Source File(s)
Performing the following three steps is only necessary if you can't access the source code files on a local or mapped, network drive (i.e. if they can only be accessed by means of an Internet browser, FTP program, etc.).
1.  Open the URL of the page you want to create a keyword index or keyword meta tag for. Although this can be done in the integrated browser, the pages of many websites on the Internet are not displayed correctly in it.
This is not a inadequacy or bug of the HTMLStripper prototype! It can be attributed exclusively to the fact that many web developers obviously don't deem it necessary to implement their websites for anything but the latest browser generation.
But, performing this step in the integrated browser is merely a question of comfort, not so much of functionality. Opening the page in the browser you normally use will, in most cases, do just as well.
2.  Once it has been fully loaded, save the page to a file on your hard disk (or some other storage medium that will make the HTML/XML, source code, file available on your computer).
However, irrespective of the browser you use, we recommend to save the file encoded in either the ANSI or Windows, Unicode character set.
Although the prototype appears to work faultlessly with European languages, encoded in UTF-8, this recommendation includes pages that contain characters not found in the English alphabet (e.g. German Umlaute, French accents, etc).
It is also imperative that the page be saved as a HTML (or XML) and therefore a plain text file (even if the file extension/suffix is not .txt !).
This should be borne in mind, because some browsers save web pages in their entirety (i.e. including, graphics and everything else) in a compound file that can only be reliably opened in the browser with which it was created. If you're uncertain if this case, we recommend verifying that it is not, by opening the saved file in a text editor (e.g. Windows Notepad). If the file contains readable text together with a lot of undecipherable symbols, it's more than likely that it's a compound file (for example, a .mht file). In such cases, simply save the file again in another format or using a different browser.
If you decide to use the integrated browser, you should also be aware of the fact that, unlike most other browsers, the graphics and other files (e.g. scripts, style sheets, etc.) referenced in/by the source code will not be saved to disk together with it. This has no detrimental effect on the ability of the HTMLStripper to process the saved page/file. It will merely result in the page not being displayed with all its visual (and/or acoustic) elements.
3.  Repeat the just described steps for all files you want to process or include in your keyword index.
Using the Integrated Browser to Acquire and Save the Source Code
These steps only need to be performed if you want to use the Integrated Browser, don't have access to the HTML (or XML) source code on a local (or LAN) drive, and have not already performed the steps described under Acquiring and Providing the Source File(s in another application.
Furthermore, using the Integrated Browser of the HTMLStripper prototype has both advantages and drawbacks. These being,
•  The page may not be displayed correctly (see remark, above).
•  When saving a page in the Integrated Browser, external files (e.g. graphics, videos, scripts, style sheets, etc.) are not saved with it, resulting in the page not being displayed correctly if subsequently opened in a browser.
•  You don't have to switch between applications.
•  If the page is displayed correctly in the Integrated Browser, it can be saved directly in the required format, making converting the file into an ANSI or Unicode encoded file superfluous.
Should you decide to use the integrated browser (in spite of its current shortcomings), proceed as follows
1.  Click on the "Browser", button style tab (below the combo boxes and Button Tool Bar) to switch to the Browser Tab Sheet.
2.  Enter the Internet address (aka URL) of the page to open in the HTMLStripper's Source File(s) Combo Box.
3.  Select/click either the Open URL Menu Item in the View Menu of the Main Menu or click on the Open URL Button in the Button Tool Bar.
4.  Wait for the page to be loaded in the Integrated Browser.
5.  Open the File Menu in the Main Menu.
6.  Select/click the File Menu's "Save as ..." Menu item. This will open the Save HTML Document Dialog.
7.  In the Language or Encoding Drop-Down List *1, at the very bottom of the Save HTML Document Dialog, select the character set and/or language in which the text that is displayed to users was authored. If the Language Drop-Down List provides multiple choices, you can select either an item with the supplement "(Windows)" or "(ISO)", or the "Unicode" item. Alternatively, if the page's text is displayed correctly with the preselected item, and this has one of the required supplements (e.g. "(Windows)"), you can simply continue with the next step.
8.  In the Save as Type Drop-Down List, which is located immediately below the File Name Combo Box, select either the "HTML File (*.htm, *.html)" or the "Text File (*.txt)" item. Both file types can subsequently be opened/loaded as source files and processed by the HTMLStripper. However, if the latter (i.e. the "Text File (*.txt)") is chosen, most browsers, including the Integrated Browser, will not display formatted output. In other words, the source code will be displayed as in the HTML Source File Viewer/Editor or any other editor.
9.  Save the file to disk through a click on the Save Button.
Converting the Source File into an ANSI or Unicode Encoded File
Unfortunately, to produce correct output, the current prototype (version 0.3) may still necessitate that the input, source code is manually converted into an ANSI or Unicode encoded file.
Whether or not this is necessary, can depend on various individual and/or a combination of factors, such as the language(s) in which the text exposed to the user was authored, the operating system, its version, edition, and user interface language, and the Internet (or other) application (e.g. browser, e-mail client, etc.) used to create or save the source file.
However, in the event that a conversion is unavoidable, the text editor that ships with all Microsoft Windows operating systems as of Windows Vista can, in most cases, be used to perform the conversion. Here is how it's done.
1.  Open the source file in Windows Notepad.
2.  Open the Notepad's File Menu and select/click the Save As Menu Item.
3.  In the Save As Dialog, enter a slightly different name for the file in the dialog's File Name Combo Box. For example, you could simply append a capital A or U to the name (not to the extension/suffix) part of the file name, to characterize it as the ANSI or Unicode (encoded) version of the original file.
4.  Depending on the Windows version you are using, select either the "ANSI", the "Unicode", or the "UTF-16 LE" item in the Encoding Drop-Down List, at the very bottom of the dialog. The latter two (i.e. "Unicode" and "UTF-16 LE"), of the referred to three items, will save the file encoded in the required Unicode format, the first item, as an ANSI encoded file.
5.  Close the dialog by means of a click on its Save Button.
6.  Open the file you have just created and saved in either another instance of Notepad or the Internet application for which it wss developed and verify that the contents are displayed correctly.
Opening the ANSI or Unicode encoded Source File in the HTMLStripper App
Once you have a source code in one of the two, currently, fully, supported formats (i.e. either an ANSI or a Windows Unicode encoded HTML or XML file), you can load it into the HTMLStripper's Source Code Viewer/Editor.
1.  To open the dialog with which you can select the HTML (or XML) source file, you can use either the "Select Source File(s) ..." menu item in the File Menu of the Main Menu or the corresponding button in the (Button) Tool Bar. A click on either one will open the Open File Dialog.
2.  Using the controls at the top of the Open File Dialog, select and open the folder in which the source file(s) you want to process is/are located.
3.  Select the file you want to process in the Open File Dialog's List View.
4.  Close the dialog by means of its Open Button.
5.  Verify that the source code is displayed in the viewer/editor on the HTMLStripper's HTML Source File Tab Sheet.
Deciding on and Creating a List of Words to Suppress/Ignore
Normally, opening/loading and/or editing the list of words to suppress/ignore would be the next step, prior to reducing the source code to the text(s) exposed to the user.
However, initially you are unlikely to know (exactly) which words were used and how often they occur in the texts you are processing, nor how relevant (or irrelevant) they may be as keywords in a keyword meta tag or an index. Furthermore, adding the words to ignore, word by word, would be tedious, to say the least. It is far easier to copy and add all the words used in the texts from the generated lists of distinct words, and then remove those you consider relevant, or at least, potentially, not irrelevant.
Nonetheless, so that you can see the effect the list of words to suppress/ignore has on the resulting list of distinct words, you might want to suppress such words like "menu", "cookie", or any other common words that you can think of and that are sure to occur in the texts of your first source files. To do this,
1.  Switch to the "Suppress(ed) Words" tab sheet by clicking on the "Suppress(ed) Words", button style tab, immediately below the combo boxes and button toolbar, in the upper part of the main window.
2.  Add each word you want to suppress/ignore in a separate, new line.
3.  When you're done, save the list by selecting/clicking on the "Save as ..." menu item in the File Menu of the Main Menu or by pressing the Save As Button in the Button Tool Bar.
When you process the first source file(s), the words you have just entered in the Suppress(ed) Words Editor will not appear in the list of words on the Distinct Words Tab Sheet (in other words, they will have been suppressed). It will also result in this list (of words to suppress) being loaded automatically when you run the HTMLStripper.
To use a different list with other source files, simply load the desired list by
1.  switching to the "Suppress(ed) Words" Tab Sheet, as already described,
2.  selecting/clicking the "Open ..." menu item in the File Menu of the Main Menu to display the Open File Dialog,
3.  using the controls at the top of the Open File Dialog, to open the folder in which the suppress(ed) words list, text file is located
4.  selecting the suppress(ed) words list file you want to use, in the Open File Dialog's List View,
5.  and closing the dialog by means of its Open Button.
Processing the Source Code
Processing the source code, stripping it of its tags, and generating the list of distinct words, is probably the easiest part of the whole procedure. All you have to do, once the source file has been loaded into the Source Code Editor, is
1.  Switch to the Stripped Contents or Distinct Words List Tab Sheet
2.  Select/click Run, in the Assignment Menu of the Main Menu or click on the corresponding button in the (Button) Tool Bar.
3.  Wait for the stripped contents to be displayed in the Stripped Contents Editor, or in the lists on the Distinct Words List Tab Sheet
Adding Keywords to the Keywords List
The current, prototype of the HTMLStripper provides five, simple methods by which keywords (and/or phrases) can be added to the list of keywords in the Keywords List View (which, as you'll already have guessed, is located on the Keywords Tab Sheet).
The first two methods are ideal if you have a lot of unusual, individual words, for example if you're a software developer who is documenting types, variables, classes, and functions, all of which normally consist of one word only.
The third method provides greater flexibilty in that phrases can also be added as keywords, and this directly from the text on the Stripped Contents Tab Sheet. The fourth method, is particularly useful in combination with the third method, in that it (somewhat) simplifies creating the inverted forms of keyword phrases. The fifth method is essentially the basis of the fourth method and is the simplest of all.
Common to all methods is that, you can add both, words and/or phrases, as often you like. Only the first time you add a particular word or phrase, will result in the word or phrase being added to the Keywords List View. However, if you retain your keyword list, process a second source file and add a word (or phrase) that is already in the list, the name of the source file and its title will be added to the respective columns of the existing keyword.
Method 1 (Selecting a range of text in the Distinct Words Editor)
1.  In the Distinct Words Editor on the left-hand side of the Distinct Words Tab Sheet select the range of text that contains the words you want to add to your list of keywords.
Text can be selected in the Distinct Words Editor as in any other editor or word processor, either by positiioning the caret in it and dragging it over the text while keeping the left mouse button pressed or by means of the keyboard's shift and arrow keys.
Alternatively, you can delete all those words you don't want as keywords and select/click the Select All menu item in either the Edit Menu of the Main Menu or in the Distinct Words Tab Sheet's Context Menu.
2.  Select/click either the Add Words to Keywords Menu Item in the Edit Menu of the Main Menu or in the Distinct Words Tab Sheet's Context Menu. This will add all words that occur in the selected text range as individual keywords to the Keywords List on the Keywords Tab Sheet.
Method 2 (Selecting Words in the Distinct Words List)
1.  In the Distinct Words List on the right-hand side of the Distinct Words Tab Sheet select the words that you want to add to your list of keywords.
Individual words in the Distinct Words List are selected, simply by clicking on them while the cursor is positioned over them.
Multiple, separated, words can be selected by holding down the Control (Ctrl) Key on the keyboard and clicking on the individual words you want to select.
A range of adjacent words can be selected by pressing the shift key on the keyboard and clicking on first (or last) word and, while keeping the shift key pressed, moving the cursor to the last (or first) word of the range, and clicking on that word.
2.  As in Method 1, above, selecting/clicking either the Add Words to Keywords Menu Item in the Edit Menu of the Main Menu or in the Distinct Words Tab Sheet's Context Menu will add the selected words to the Keywords List on the Keywords Tab Sheet.
Selecting and Editing Words and/or Phrases to Add to the Keywords List
Apart from adding words from the list of distinct words, you can also select and add both, individual words and phrases, from the Stripped Contents as keywords.
Method 3 (Selecting a text segment in the Stripped Contents Editor)
1.  Through a click of the mouse, position the cursor/caret in the Stripped Contents Editor and mark/select the word or phrase you would like to add to the list of keywords. Note, should the word or phrase extend over several lines, the HTMLStripper will reduce the selected text range to a single line prior to adding it to the Keyword List.
2.  Open either the Edit Menu in the Main Menu or the Stripped Contents Editor's Context Menu.
3.  In whichever of the two menus you have opened, select/click the "Add the Selection as Keyword" menu item.
Method 4 (Selecting a text segment in the Stripped Contents Editor and Editing it)
This method was primarily devised to make it a little easier to create inverted (keyword) phrases and add them together with the original phrase in more or less in one step.
1.  As in Step 1 of Method 3, position the cursor/caret in the Stripped Contents Editor and select the word or phrase you want to add to the Keywords List.
As you may have noticed, when selecting text in the Stripped Contents Editor, the selected text is replicated in the Keyword Combo Box, which, per default, is situated immediately above the Tab Sheets. This has three advantages when adding a phrase as a keyword
If you have selected a phrase that extends over several lines, it is reduced to a single line in the Keyword Combo Box.
If you want to rephrase (e.g. invert) the phrase, the cursor is closer to the combo box, which already contains the text you will need to modify.
You don't have to open a menu and select the appropriate menu item to add the selected (or modified) text as a keyword. It can be added, simply by pressing the Add/Apply Button, to the right of the Keyword Combo Box.
2.  Press the Add/Apply Button (located on the right-hand side of the Keyword Combo Box), to add the unmodified phrase (you have marked/selected in the Stripped Contents Editor) to the Keywords List.
3.  Position the cursor in the Keyword Combo Box and make the desired changes to the word or phrase.
For example, you might want to invert the original phrase you just have added in steps 1 and 2. If this phrase were "system level applications", all you would have to do is cut "system level" out of the text in the Keyword Combo Box and, together with a comma, append it to the sole, remaining word "applications". The resulting, inverted phrase in the Keyword Combo Box would then be "applications, system level".
4.  Press the Add/Apply Button again, This will add the inverted or otherwise modified phrase (or word) to the Keywords List as well.
Method 5 (Entering the Keyword Directly in the Keyword Combo Box or Distinct Words List)
Because it is the only method that does not require processing a source file first and can even be applied without one altogether, this method is predestined to create template keyword lists and files. This is useful, if you have numerous "keyword projects" in which you want to include certain words and/or phrases in all or a group of meta tags and/or index files.
1.  Enter the word or phrase you want to add to the keywords template list in the Keyword Combo Box.
2.  Press the Add/Apply Button to add the word or phrase to the Keywords List on the Keywords Tab Sheet.
3.  Repeat steps 1 and 2 as often as necessary to complete your template.
4.  Save the Keywords List as described under ...
Alternatively, if you have a large number of words and/or phrases you want to include per default in your index projects, you can achieve the same result by
1.  switching to the Disinct Words Tab Sheet,
2.  positioning the cursor/caret in the Distinct Words Editor on the right-hand side of the Distinct Words Tab Sheet,
3.  selecting and deleting any remnants of previously generated lists,
4.  entering the words and/or phrases, one word or phrase per line, in the Distinct Words Editor,
5.  selecting the entire text in the Distinct Words Editor,
6.  opening the Main Menu's Edit Menu or the Distinct Words Editor's Context Menu and selecting/clicking "Add Words to Keywords",
7.  and saving the keywords list to a file, as described under Saving the Created and/or Processed Files, below.
Deleting Text, Words, and Keywords
With the exception of the HTML Source Code Viewer/Editor (which is currently still only a viever and not an editor) and the list on the Auto Find and Add Tab Sheet, it is possible to delete all texts, phrases, and words, including the keywords on the Keywords Tab Sheet, down to individual characters/letters. However, in version 0.3, the methods by which this is (and can be) achieved still differ.
Whereas, it is necessary to delete the words and/or phrases in the Keyword and Distinct Words lists by means of the Delete menu items in the Main Menu, the respective list's context menu, or the Delete Button in the Button Tool Bar, the texts in most of the other controls can only be deleted by means of the Delete (Del) Key on the keyboard.
The following table summarizes the issue.
Control Delete Key
on Keyboard
Delete Menu Item
in Main Menu
Delete Menu Item
in Context Menu
Delete Button in
Button Tool Bar
Source File(s)
Combo Box
Check mark = Yes
Keyword
Combo Box
Check mark = Yes
HTML Source Code
Viewer/Editor
Integrated
Browser
Stripped Contents
Editor
Check mark = Yes
Distinct Words
Editor
Check mark = Yes
Distinct Words
List View
Check mark = Yes Check mark = Yes Check mark = Yes
Keywords
List View
Check mark = Yes Check mark = Yes Check mark = Yes
Auto Find and
Add List View
Suppressed Words
Editor
Log Check mark = Yes
Exporting the Keywords List
Once you're done with adding words, phrases, and references/links to the Keywords List, you can export it as both a HTML keywords meta tag and/or a Microsoft Compiled Help .hhk file. To do either or both,
1.  Open the File Menu in the Main Menu.
2.  Open the "Export as" menu item's sub-menu (by moving the cursor over the "Export as Menu Item")
3.  Select/click the Keywords Meta Tag or Compiled Help Index File menu item to save the keywords either as a keywords meta tag or .chm index file, respectively. Both menu items will open the Save As Dialog with the appropriate file type preselected into the Save as Type Drop-Down List.
4.  In the Save As Dialog, use the controls at the top of the dialog to navigate to the folder in which you want to save the respective file.
5.  Enter the file name under which you would like to save the exported file in the File Name Combo Box. The File Name Combo Box is located in the lower half of the dialog between the dialog's File List View (above it) and the Save as Type Drop-Down List (below it). Alternatively you can simply retain the file name we've suggested and which is already entered in the File Name Combo Box.
6.  Close the dialog by means of its Save Button.
Saving the Created and/or Processed Files
With the exception of the source file, the page displayed in the Integrated Browser, and the list on the Auto Find and Add Tab Sheet, the loaded and generated files can be saved both individually and/or collectively.
Although, unlike the source file and the keywords in the Auto Find and Add List, which cannot be saved at all, the pages/files in the Integrated Browser can be saved, but have to be saved individually, on a per page/file basis. Exported files can also be saved only individually, on a per file basis.
Saving the Page(s)/File(s) Displayed in the Integrated Browser
Note, the first three steps only have to be performed if the page has not already been loaded or you have not visited it during your current HTMLStripper session.
1.  Select the URL or path and name of the page to save by entering the full URL in the Source File(s) Combo Box.
2.  Press/click on the Open URL Button in the Button Tool Bar.
3.  Wait for the page to displayed in the Integrated Browser.
4.  Should you have switched to another tab sheet, return to/reopen the Browser Tab Sheet.
5.  Open the File Menu in the Main Menu.
6.  In the File Menu, select/click the "Save as ..." Menu item to open the Save HTML Document Dialog.
7.  In the Language (or Encoding) Drop-Down List*1 of the Save HTML Document Dialog, select the character set/language, in which the the text exposed to users, was authored.
8.  In the Save as Type Drop-Down List, select the type of file as which you save the source code of the page in the Integrated Browser.
9.  Close the Save HTML Document Dialog by means of its Save Button.
 
Saving Files Individually
The files the HTMLStripper generates and/or the modifications you make to the files on the Stripped Contents, Distinct Words List, Keywords, Suppress(ed) Words, and Log tab sheets can be saved on a per tab sheet basis by
1.  switching to the tab sheet on which the file you want to save is located,
2.  opening the Main Menu's File Menu, and
3.  selecting/clicking the Save or Save As Menu Item.
If you have not previously saved a particular tab sheet's file(s), the Save Menu Item will automatically open the Save As Dialog, in which you can select and open the folder in which you want to save the file, specify a file name, and select the file type, as which you want it to be saved. Otherwise the previously saved file's contents will be replaced by the current contents.
The Save As Menu Item always opens the Save As Dialog, irrespective of whether the file has already been saved or not.
Saving Files Collectively
When processing multiple source files, it would be a nuisance to have to save the file on each tab sheet individually. Furthermore, doing so would be error prone, in that saving the contents of a tab sheet could easily be forgotten. This can be avoided by saving the files collectively.
Saving the files collectively is essentially equivalent to switching to each tab sheet and saving it individually, only that this performed automatically. To save the files collectively:
1.  Open the File Menu in the Main Menu.
2.  Select/click the Save All Menu Item.
3.  If the Save As Dialog is opened because a file needs to be saved for the first time and you do not want to skip saving it, proceed as if you were saving the file individually. Otherwise (i.e. if you don't want to save the file), simply close the dialog by means of its Cancel Button. The HTMLStripper will then continue to save the remaining files (if any).
Opening/Loading Files
In this, provisional, version of the HTMLStripper, it is only possible to open/load files into viewers/editors on the following tab sheets
HTML Source File
Browser
Stripped Contents
Keywords
Suppress(ed) Words
It is not yet possible to load the other files produced by HTMLStripper into the appropriate viewers/editors on the other tab sheets (e.g. the Distinct Words, Log, etc.).
Furthermore, when loading files into the Stripped Contents and Suppress(ed) Words Editors, these have to be ANSI encoded files. Only the Source Code Viewer/Editor on the HTML Source Code Tab Sheet and the Integrated Browser are currently capable of handling and displaying Unicode files as well. Attempting to load Unicode encoded files into the viewers/editors on the Stripped Contents and/or Suppress(ed) Words tab sheets will result in the files not being displayed correctly. Editing and saving these files will also fail and/or may lead to data corruption.
Nonetheless, within the limitations imposed by the current functionality, files can be loaded into the respective viewers/editors, much like they are saved individually. That is by,
1.  switching to the tab sheet on which the viewer/editor is located, in which the file should be opened.
2.  Opening the File Menu in the Main Menu.
3.  Selecting/clicking the Open Menu Item in the File Menu to display the Open Dialog.
4.  Selecting the file to open in the Open Dialog's File List View (located in the middle of the dialog) or entering its name in the File Name Combo Box, in the lower half of the dialog, above the File of Type Drop Down List.
5.  Closing the Open Dialog by means of a click on the Open Button.
Creating a Compiled Help, .hhk Index File, Step by Step
1.  Open the following three pages on our website and save each page as a separate, ideally complete, HTML, page, to a easily accessible location on your computer.
ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents
(SST) ShlWAPIFunctionInfo Classes
Tools
2.  Open the Open File Dialog to select the first source file to process, by selecting/clicking on the "Select Source File(s) ..." menu item in the File Menu of the Main Menu.
3.  In the Open File Dialog, select the HTML file that is the first page you have saved in step 1, by selecting it in the dialog's Files List View. This should be the file to which you saved the "ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents" page.
4.  Close the Open File Dialog by means of its Open Button.
5.  Wait for the source code to be displayed on the HTML Source File Tab Sheet.
6.  Switch to the Distinct Words List Tab Sheet.
7.  Process the file by opening the Assignment Menu in the Main Menu and selecting/clicking the Run Menu Item
8.  Wait for the word lists to be displayed in both panes of the Distinct Words List Tab Sheet
9.  In the left pane on the Distinct Words List Tab Sheet, select the entire text, from the word C_ONLINEHELPURL on downward
10.  Open the Edit Menu in the Main Menu.
11.  Select/click the Add Words to Keywords Menu Item.
12.  Switch to the Keywords Tab Sheet to view and/or save the just added keywords.
13.  Save the contents of the Keywords List as described under Dummy Title
14.  Open the source code of the second page you saved in Step 1 (the source file of the page titled "(SST) ShlWAPIFunctionInfo Classes").
15.  Once it has been fully loaded, press the Run Button in the Button Tool Bar and wait for the word lists to be displayed on the Distinct Words List Tab Sheet
16.  In the pane on the right-hand side of the Distinct Words List Tab Sheet, select the words, "Classes", "TAboutBox", and "TForm1".
17.  Add the selected words to the Keywords List, by selecting/clicking on the "Add Words to Keywords" menu item in the Distinct Words List's Context Menu.
18.  Switch to the Auto Find and Add Tab Sheet.
19.  Open the Assignment Menu in the Main Menu.
20.  Select/Click the Auto Find and Add Menu Item at the bottom of the Assignment Menu.
21.  Open the Auto Find and Add List's context menu through a click of the right mouse button.
22.  Select/click the Add Source File as Link Menu Item.
23.  Switch to the Keywords Tab Sheet to view and save the modifications made to the Keywords List.
In this case, the modifications will be less obvious than the previous two additions, because the last action will have primarily added information to the list's second and third columns.
24.  Using the same procedure as before, load the third source file (the HTML source file of the page with the title "Tools").
25.  Switch to the Stripped Contents Tab Sheet.
26.  Process the source file by either of the two already described methods or simply by pressing the function key F5 on your keyboard.
27.  In the Stripped Contents Editor, mark/select the phrase "Intel CPU native code"
28.  In the Edit Menu of the Main Menu select/click the Add Selection as Keyword Menu Item.
29.  Position the cursor/caret before the Word Intel in the Keyword Combo Box.
30.  Mark/select the words "Intel CPU".
31.  Open the Keyword Combo Box's context menu through a click of the right mouse button while the cursor is over the Keyword Combo Box.
32.  In the Keyword Combo Box's context menu, select/click "Cut".
33.  Position the cursor after the word "code" in the Keyword Combo Box and, by means of the keyboard, add a comma and a blank/space to the end of the text.
34.  Open the Keyword Combo Box's context menu again and select/click the Paste Menu Item. This ought to append the two words "Intel CPU" to the text "native code, ".
35.  Press the Add/Apply Button to the right of the Keyword Combo Box.
36.  Return to the Keywords Tab Sheet.
37.  Open the File Menu in the Main Menu.
38.  Select/click the Compiled Help Index File in the sub-menu of the "Export as" menu item.
39.  In the Save As Dialog that ought to have been opened in the previous step, select the folder under which you would like to save the Compiled Help, .hhk, index file.
40.  Enter a file name for the file in the Save As Dialog's File Name Combo Box.
41.  Close the dialog by clicking on its Save Button.
As the exported file (in spite of its file extension) is a plain text file, you can open, view, and edit it in any editor. To acquaint yourself with the format, we recommend doing so. The HTML code, below, shows what it should look like when opened in one. However, if it is at your disposal, we also recommend opening it in the Microsoft (Compiled) Help Workshop.
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<meta name="generator" content="SST HTMLStripper Version 0.3">
</head>
<body>
<ul>
<li><object type="text/sitemap">
<param name="Name" value="C_ONLINEHELPURL">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="C_SHLWAPIDLLNAME">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Classes">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Constants">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Contents">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<code><param name="Name" value="Developer">
<code><param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<code><param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<code><param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<code><param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
<code></object>
<li><object type="text/sitemap">
<param name="Name" value="Functions">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Intel CPU native code">
<param name="Name" value="Tools">
<param name="Local" value="Tools.htm">
</object>
<li><object type="text/sitemap">
<param name="Name" value="IsValidHandle">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="July">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="last">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="mail"> <param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="native code, Intel CPU">
<param name="Name" value="Tools">
<param name="Local" value="Tools.htm">
</object>
<li><object type="text/sitemap">
<param name="Name" value="PSSTWinResLanguageId">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Reference">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="ShlWAPIFunctionInfo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="ShlWAPIFunctionInfoAbout">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="ShlWAPIFunctionInfoMain01">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Software">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Table">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TAboutBox">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TDllGetVersionProc">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TDllVersionInfo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TDllVersionInfo2">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TForm1">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTAdvancedComboBox">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTAdvancedListView">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTAdvancedMemo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html"> </object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTBasicTextSearchOptions">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTCharSetType">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTDllVerInfo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTWinResLanguageId">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Types">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Units">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="updated">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Version">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
g <param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
</ul>
</body>
<html>
Using the HTMLStripper with Unformatted Text Files
Although we generally don't process files other than HTML/XML files, or in other words formatted texts, the HTMLStripper can also be used to generate word lists from plain, unformatted texts, such as the "ReadMe.txt" file that is part of the setup package. Just as with HTML/XML files, it will generate the two word lists, which can subsequently be used to create a keyword list from these files. Unfortunately, the word count feature, does not always work entirely reliable, when used with such files if they also contain HTML/XML tags (i.e. if the file is a sort of scratch pad type file).
Using the HTMLStripper for Proofreading
Although, we already had integrated a spell-checker into the HTMLStripper, we were forced to remove it because it only supported a single language and it was clear that further languages would not be made available. But, we have not given up the idea and are looking into alternative solutions.
However, even without an integrated spell-checker The HTMLStripper has two features that simplify proofreading considerably.
1.  The Stripped Contents Editor
2.  The word lists on the Distinct Words List Tab Sheet.
Because, all hidden, on-demand/triggered, texts, such as the cookie settings, are exposed and distracting elements, such as background graphics and advertising images were removed, detecting grammatical and/or usage errors on a HTML page is easier in the Stripped Contents Editor. Typographical errors, on the other hand, can be identified more easily in the alphabetically sorted word lists on the Distinct Words Tab Sheet. The following, two scenarios exemplify the issue.
a. Scenario 1 (Context Required)
Consider the following two sentences:
Eve gave Adam an appeal from the tree of forbidden fruit. To prevent being banned from paradise, Adam requested a hearing before the court of apples.
Unless the sentences are from a satirical or nonsensical version of the bible the words "appeal" and "apples" were either misspelled or used in the wrong context, in spite of having been spelled correctly when examined individually (i.e. out of context).
As these types of errors may require the entire page's text, they can be identified far more easily in the Stripped Contents Editor than in the word lists on the Distinct Words Tab Sheet.
b. Scenario 2 (Simple "Typos")
Consider the following, abridged, list of words
accompanying
accounts
actual
acutal
all
also
...
Prerequisites
preserving
Previous
properies
Properties
purposes
...
Even by going over this list only once, you will have probably noticed the two, typographical errors. They stick out pretty distinctly, because a word seems to occur twice. In fact, this example is from lists generated from two of our website's pages.
Summary
Both features can be used individually or in combination with one another, but still require a human to detect errors in the text or words.
We have adopted the following proofreading method for our pages' texts.
1.  Going down the list of words on the Distinct Words Tab Sheet and if we detect any "unusual" words and/or spellings,
2.  reading either the relevant parts very closely or reviewing the page's entire text in both, the Stripped Contents Editor and/or on the correctly displayed page in a browser.
Footnotes
*1 Depending on the Windows version and edition, the caption that precedes the drop-down list can vary. Whereas it might read "Language" under one Windows version, it may read "Encoding" under another. Furthermore, the captions we have used to name the drop-down list are those of English, Windows, editions. However, all captions on the Save HTML Document Dialog are always those of the primary, user intercace, language, under which the HTMLStripper is ruuning.


Discover
Downloads
Support
Site Map


Document/Contents version 1.00
Page/URI last updated on 22.10.2023
 
Copyright © Stoelzel Software Technologie (SST) 2010 - 2017
Suggestions and comments mail to:
webmaster@stoelzelsoftwaretech.com