|
|
This documentation is preliminary in nature. It applies to the prototype, version 0.3
of the application currently named HTMLStripper, the user interface and functionality
of which are likely to be subject to numerous changes in the prototype versions leading
up to the first, fully implemented, version 1.0.
It therefore assumes that the user is, at least, familiar with the basics, of using
Microsoft Windows, computer applications.
Furthermore, as the guide does not provide information on how to integrate the generated
data into HTML/XML or Microsoft Compiled Help (.CHM) projects, the user should be proficient
in the use of various other applications, to make full use of the described features.
Nonetheless, the guide does cover the most relevant aspects of using the application.
Specifically these are
• |
Acquiring and providing the source file(s)
|
• |
Converting the source file into an ANSI or 16-bit, Unicode encoded file,
if necessary
|
• |
Opening the ANSI or Unicode encoded source file in the SST HTMLStripper application
|
• |
Deciding on and creating a list of words to suppress/ignore
|
• |
Stripping the source file of its tags
|
• |
Selecting and/or editing the words and/or phrases to add to the keywords list
|
• |
Adding words, phrases, etc, from the word list(s) and stripped contents to the
keywords list
|
• |
Exporting the keywords as a keywords meta tag or Microsoft compiled help, .hhk,
keywords, index file
|
• |
Saving the created and/or processed files
|
Acquiring and Providing the Source File(s)
Performing the following three steps is only necessary
if you can't access the source code files on a local or mapped, network drive
(i.e. if they can only be accessed by means of an Internet browser, FTP program, etc.).
1. |
Open the URL of the page you want to create a keyword index or keyword meta tag for.
Although this can be done in the integrated browser, the pages of many websites on the
Internet are not displayed correctly in it.
This is not a inadequacy or bug of the HTMLStripper prototype!
It can be attributed exclusively to the fact that
many web developers obviously don't deem it necessary to implement
their websites for anything but the latest browser generation.
But, performing this step in the integrated browser is merely a question of comfort,
not so much of functionality. Opening the page in the browser you normally use
will, in most cases, do just as well.
|
2. |
Once it has been fully loaded, save the page to a file on your hard disk (or some other
storage medium that will make the HTML/XML, source code, file available on your computer).
However, irrespective of the browser you use, we recommend to save the file
encoded in either the ANSI or Windows, Unicode character set.
Although the prototype appears to work faultlessly with European languages, encoded in UTF-8,
this recommendation includes pages that contain characters not found in the English alphabet
(e.g. German Umlaute, French accents, etc).
It is also imperative that the page be saved as a HTML (or XML) and therefore a
plain text file (even if the file extension/suffix is not .txt !).
This should be borne in mind, because some browsers save web
pages in their entirety (i.e. including, graphics and everything else) in a compound
file that can only be reliably opened in the browser with which it was created.
If you're uncertain if this case, we recommend verifying that it is not, by opening
the saved file in a text editor (e.g. Windows Notepad).
If the file contains readable text together with a lot of undecipherable symbols,
it's more than likely that it's a compound file (for example, a .mht file).
In such cases, simply save the file again in another format or using a different browser.
If you decide to use the integrated browser, you should also be aware of the fact
that, unlike most other browsers, the graphics and other files (e.g. scripts, style sheets, etc.)
referenced in/by the source code will not be saved to disk together with it.
This has no detrimental effect on the ability of the HTMLStripper to process the saved page/file.
It will merely result in the page not being displayed with all its visual (and/or acoustic) elements.
|
3. |
Repeat the just described steps for all files you want to process or include in
your keyword index.
|
Using the Integrated Browser to Acquire and Save the Source Code
These steps only need to be performed if you want to use the Integrated Browser,
don't have access to the HTML (or XML) source code on a local (or LAN) drive,
and have not already performed the steps described under
Acquiring and Providing the Source File(s
in another application.
Furthermore, using the Integrated Browser of the HTMLStripper prototype has both advantages and drawbacks.
These being,
• |
The page may not be displayed correctly
(see remark, above).
|
• |
When saving a page in the Integrated Browser,
external files (e.g. graphics, videos, scripts, style sheets, etc.)
are not saved with it, resulting in the page not being displayed correctly
if subsequently opened in a browser.
|
• |
You don't have to switch between applications.
|
• |
If the page is displayed correctly in the Integrated Browser, it can be saved directly in the required
format, making converting the file into an ANSI or Unicode encoded file superfluous.
|
Should you decide to use the integrated browser (in spite of its current shortcomings),
proceed as follows
1. |
Click on the "Browser", button style tab (below the combo boxes
and Button Tool Bar) to switch to the Browser Tab Sheet.
|
2. |
Enter the Internet address (aka URL) of the page to open
in the HTMLStripper's Source File(s) Combo Box.
|
3. |
Select/click either the Open URL Menu Item in the View Menu of the
Main Menu or click on the Open URL Button in the Button Tool Bar.
|
4. |
Wait for the page to be loaded in the Integrated Browser.
|
5. |
Open the File Menu in the Main Menu.
|
6. |
Select/click the File Menu's "Save as ..." Menu item. This will open the
Save HTML Document Dialog.
|
7. |
In the Language or Encoding Drop-Down List
,
at the very bottom of the Save HTML Document Dialog,
select the character set and/or language in which the text that is displayed to users
was authored. If the Language Drop-Down List provides multiple choices, you can select either
an item with the supplement "(Windows)" or "(ISO)", or the "Unicode" item.
Alternatively, if the page's text is displayed correctly with the preselected item,
and this has one of the required supplements (e.g. "(Windows)"), you can simply
continue with the next step.
|
8. |
In the Save as Type Drop-Down List, which is located immediately below the File Name Combo Box,
select either the "HTML File (*.htm, *.html)" or the "Text File (*.txt)"
item. Both file types can subsequently be opened/loaded as source files and processed by the
HTMLStripper. However, if the latter (i.e. the "Text File (*.txt)") is chosen,
most browsers, including the Integrated Browser, will not display formatted output. In other words,
the source code will be displayed as in the HTML Source File Viewer/Editor or any other editor.
|
9. |
Save the file to disk through a click on the Save Button.
|
Converting the Source File into an ANSI or Unicode Encoded File
Unfortunately, to produce correct output, the current prototype (version 0.3)
may still necessitate that the input, source code is manually converted into an ANSI or Unicode encoded file.
Whether or not this is necessary, can depend on various individual and/or a combination of factors,
such as the language(s) in which the text exposed to the user was authored,
the operating system, its version, edition, and user interface language,
and the Internet (or other) application (e.g. browser, e-mail client, etc.)
used to create or save the source file.
However, in the event that a conversion is unavoidable, the text editor that ships with all
Microsoft Windows operating systems as of Windows Vista can, in most cases, be used to perform
the conversion. Here is how it's done.
1. |
Open the source file in Windows Notepad.
|
2. |
Open the Notepad's File Menu and select/click the Save As Menu Item.
|
3. |
In the Save As Dialog, enter a slightly different name for the file in the
dialog's File Name Combo Box. For example, you could simply append a capital
A or U to the name (not to the extension/suffix) part of the file name, to
characterize it as the ANSI or Unicode (encoded) version of the original file.
|
4. |
Depending on the Windows version you are using, select either the "ANSI", the
"Unicode", or the "UTF-16 LE" item in the Encoding Drop-Down List, at the very bottom
of the dialog. The latter two (i.e. "Unicode" and "UTF-16 LE"), of the referred to three
items, will save the file encoded in the required Unicode format, the first item, as an
ANSI encoded file.
|
5. |
Close the dialog by means of a click on its Save Button.
|
6. |
Open the file you have just created and saved in either another instance
of Notepad or the Internet application for which it wss developed and verify
that the contents are displayed correctly.
|
Opening the ANSI or Unicode encoded Source File in the HTMLStripper App
Once you have a source code in one of the two, currently, fully, supported formats
(i.e. either an ANSI or a Windows Unicode encoded HTML or XML file), you can load it
into the HTMLStripper's Source Code Viewer/Editor.
1. |
To open the dialog with which you can select the HTML (or XML) source file, you can use either
the "Select Source File(s) ..." menu item in the File Menu of the Main Menu
or the corresponding button in the (Button) Tool Bar. A click on either one will open the
Open File Dialog.
|
2. |
Using the controls at the top of the Open File Dialog,
select and open the folder in which the source file(s) you want to process is/are located.
|
3. |
Select the file you want to process in the Open File Dialog's List View.
|
4. |
Close the dialog by means of its Open Button.
|
5. |
Verify that the source code is displayed in the viewer/editor
on the HTMLStripper's HTML Source File Tab Sheet.
|
Deciding on and Creating a List of Words to Suppress/Ignore
Normally, opening/loading and/or editing the list of words to suppress/ignore would be the next step,
prior to reducing the source code to the text(s) exposed to the user.
However, initially you are unlikely to know (exactly) which words were used and how often they
occur in the texts you are processing, nor how relevant (or irrelevant) they may be as keywords in
a keyword meta tag or an index.
Furthermore, adding the words to ignore, word by word, would be tedious, to say the least.
It is far easier to copy and add all the words used in the texts from the generated lists of distinct words,
and then remove those you consider relevant, or at least, potentially, not irrelevant.
Nonetheless, so that you can see the effect the list of words to suppress/ignore has on the
resulting list of distinct words, you might want to suppress such words like "menu",
"cookie", or any other common words that you can think of and that are sure to occur
in the texts of your first source files. To do this,
1. |
Switch to the "Suppress(ed) Words" tab sheet by clicking on the
"Suppress(ed) Words", button style tab, immediately below the combo boxes
and button toolbar, in the upper part of the main window.
|
2. |
Add each word you want to suppress/ignore in a separate, new line.
|
3. |
When you're done, save the list by selecting/clicking on the "Save as ..." menu item
in the File Menu of the Main Menu or by pressing the Save As Button in the Button Tool Bar.
|
When you process the first source file(s), the words you have just entered in the
Suppress(ed) Words Editor will not appear in the list of words on the
Distinct Words Tab Sheet (in other words, they will have been suppressed).
It will also result in this list (of words to suppress) being loaded automatically when
you run the HTMLStripper.
To use a different list with other source files, simply load the desired list by
1. |
switching to the "Suppress(ed) Words" Tab Sheet, as already described,
|
2. |
selecting/clicking the "Open ..." menu item in the File Menu of the Main Menu
to display the Open File Dialog,
|
3. |
using the controls at the top of the Open File Dialog,
to open the folder in which the suppress(ed) words list,
text file is located
|
4. |
selecting the suppress(ed) words list file you want to use,
in the Open File Dialog's List View,
|
5. |
and closing the dialog by means of its Open Button.
|
Processing the Source Code
Processing the source code, stripping it of its tags, and generating the
list of distinct words, is probably the easiest part of the whole procedure.
All you have to do, once the source file has been loaded into the
Source Code Editor, is
1. |
Switch to the Stripped Contents or Distinct Words List Tab Sheet
|
2. |
Select/click Run, in the Assignment Menu of the Main Menu or click on the
corresponding button in the (Button) Tool Bar.
|
3. |
Wait for the stripped contents to be displayed in the Stripped Contents Editor,
or in the lists on the Distinct Words List Tab Sheet
|
Adding Keywords to the Keywords List
The current, prototype of the HTMLStripper provides five, simple methods
by which keywords (and/or phrases) can be added to the list of keywords in the Keywords List View
(which, as you'll already have guessed, is located on the Keywords Tab Sheet).
The first two methods are ideal if you have a lot of unusual, individual words, for example
if you're a software developer who is documenting types, variables, classes,
and functions, all of which normally consist of one word only.
The third method provides greater flexibilty in that phrases can also be added as keywords,
and this directly from the text on the Stripped Contents Tab Sheet.
The fourth method, is particularly useful in combination with the third method,
in that it (somewhat) simplifies creating the inverted forms of keyword phrases.
The fifth method is essentially the basis of the fourth method and is the simplest of all.
Common to all methods is that, you can add both, words and/or phrases,
as often you like. Only the first time you add a particular word or phrase,
will result in the word or phrase being added to the Keywords List View.
However, if you retain your keyword list, process a second source file and add
a word (or phrase) that is already in the list, the name of the source file and its
title will be added to the respective columns of the existing keyword.
Method 1 (Selecting a range of text in the Distinct Words Editor) |
1. |
In the Distinct Words Editor on the left-hand side of the Distinct Words Tab Sheet
select the range of text that contains the words you want to add to your list of keywords.
Text can be selected in the Distinct Words Editor as in any other editor or word processor,
either by positiioning the caret in it and dragging it over the text while keeping the
left mouse button pressed or by means of the keyboard's shift and arrow keys.
Alternatively, you can delete all those words you don't want as keywords and select/click
the Select All menu item in either the Edit Menu of the Main Menu or in the
Distinct Words Tab Sheet's Context Menu.
|
2. |
Select/click either the Add Words to Keywords Menu Item in the Edit Menu of the
Main Menu or in the Distinct Words Tab Sheet's Context Menu.
This will add all words that occur in the selected text range as individual keywords
to the Keywords List on the Keywords Tab Sheet.
|
Method 2 (Selecting Words in the Distinct Words List) |
1. |
In the Distinct Words List on the right-hand side of the Distinct Words Tab Sheet
select the words that you want to add to your list of keywords.
Individual words in the Distinct Words List are selected, simply by clicking on them while
the cursor is positioned over them.
Multiple, separated, words can be selected by holding down the Control (Ctrl) Key
on the keyboard and clicking on the individual words you want to select.
A range of adjacent words can be selected by pressing the shift key on the keyboard
and clicking on first (or last) word and, while keeping the shift key pressed, moving
the cursor to the last (or first) word of the range, and clicking on that word.
|
2. |
As in Method 1, above, selecting/clicking either the Add Words to Keywords Menu Item
in the Edit Menu of the Main Menu or in the Distinct Words Tab Sheet's Context Menu
will add the selected words to the Keywords List on the Keywords Tab Sheet.
|
Selecting and Editing Words and/or Phrases to Add to the Keywords List
Apart from adding words from the list of distinct words,
you can also select and add both, individual words and phrases, from the Stripped Contents
as keywords.
Method 3 (Selecting a text segment in the Stripped Contents Editor) |
1. |
Through a click of the mouse, position the cursor/caret in the Stripped Contents Editor
and mark/select the word or phrase you would like to add to the list of keywords.
Note, should the word or phrase extend over several lines, the HTMLStripper will reduce the
selected text range to a single line prior to adding it to the Keyword List.
|
2. |
Open either the Edit Menu in the Main Menu or the Stripped Contents Editor's
Context Menu.
|
3. |
In whichever of the two menus you have opened, select/click the
"Add the Selection as Keyword" menu item.
|
Method 4 (Selecting a text segment in the Stripped Contents Editor and Editing it) |
This method was primarily devised to make it a little easier to
create inverted (keyword) phrases and add them together with the original phrase
in more or less in one step.
|
1. |
As in Step 1 of Method 3, position the cursor/caret in the Stripped Contents Editor
and select the word or phrase you want to add to the Keywords List.
|
As you may have noticed, when selecting text in the Stripped Contents Editor,
the selected text is replicated in the Keyword Combo Box,
which, per default, is situated immediately above the Tab Sheets.
This has three advantages when adding a phrase as a keyword
• |
If you have selected a phrase that extends over several lines, it is reduced to a
single line in the Keyword Combo Box.
|
• |
If you want to rephrase (e.g. invert) the phrase, the cursor is closer to the combo box,
which already contains the text you will need to modify.
|
• |
You don't have to open a menu and select the appropriate menu item to add the selected
(or modified) text as a keyword.
It can be added, simply by pressing the Add/Apply Button, to the right of the
Keyword Combo Box.
|
|
2. |
Press the Add/Apply Button (located on the right-hand side of the Keyword Combo Box),
to add the unmodified phrase (you have marked/selected in the Stripped Contents Editor)
to the Keywords List.
|
3. |
Position the cursor in the Keyword Combo Box and make the desired
changes to the word or phrase.
For example, you might want to invert the original phrase you just
have added in steps 1 and 2.
If this phrase were "system level applications",
all you would have to do is cut "system level" out of the text in the
Keyword Combo Box and, together with a comma, append it to the sole, remaining word
"applications". The resulting, inverted phrase in the Keyword Combo Box
would then be "applications, system level".
|
4. |
Press the Add/Apply Button again,
This will add the inverted or otherwise modified phrase (or word)
to the Keywords List as well.
|
Method 5 (Entering the Keyword Directly in the Keyword Combo Box or Distinct Words List) |
Because it is the only method that does not require processing a source file first
and can even be applied without one altogether, this method is predestined to
create template keyword lists and files.
This is useful, if you have numerous "keyword projects"
in which you want to include certain words and/or phrases in all or a
group of meta tags and/or index files.
|
1. |
Enter the word or phrase you want to add to the keywords template list in
the Keyword Combo Box.
|
2. |
Press the Add/Apply Button to add the word or phrase to the
Keywords List on the Keywords Tab Sheet.
|
3. |
Repeat steps 1 and 2 as often as necessary to complete your template.
|
4. |
Save the Keywords List as described under ...
|
Alternatively, if you have a large number of words and/or phrases you want
to include per default in your index projects, you can achieve the same result by
|
1. |
switching to the Disinct Words Tab Sheet,
|
2. |
positioning the cursor/caret in the Distinct Words Editor on
the right-hand side of the Distinct Words Tab Sheet,
|
3. |
selecting and deleting any remnants of previously generated lists,
|
4. |
entering the words and/or phrases, one word or phrase per line,
in the Distinct Words Editor,
|
5. |
selecting the entire text in the Distinct Words Editor,
|
6. |
opening the Main Menu's Edit Menu or the Distinct Words Editor's Context Menu and
selecting/clicking "Add Words to Keywords",
|
7. |
and saving the keywords list to a file, as described under
Saving the Created and/or Processed Files,
below.
|
Deleting Text, Words, and Keywords
With the exception of the HTML Source Code Viewer/Editor
(which is currently still only a viever and not an editor)
and the list on the Auto Find and Add Tab Sheet,
it is possible to delete all texts, phrases, and words, including the keywords on the Keywords Tab Sheet,
down to individual characters/letters.
However, in version 0.3, the methods by which this is (and can be) achieved still differ.
Whereas, it is necessary to delete the words and/or phrases in the Keyword and
Distinct Words lists by means of the Delete menu items in the Main Menu, the
respective list's context menu, or the Delete Button in the Button Tool Bar,
the texts in most of the other controls can only be deleted by means of the
Delete (Del) Key on the keyboard.
The following table summarizes the issue.
Control |
Delete Key
on Keyboard
|
Delete Menu Item
in Main Menu
|
Delete Menu Item
in Context Menu
|
Delete Button in
Button Tool Bar
|
Source File(s)
Combo Box
|
|
— |
|
— |
Keyword
Combo Box
|
|
— |
|
— |
HTML Source Code
Viewer/Editor
|
— |
— |
— |
— |
Integrated
Browser
|
— |
— |
— |
— |
Stripped Contents
Editor
|
|
— |
— |
— |
Distinct Words
Editor
|
|
— |
— |
— |
Distinct Words
List View
|
— |
|
|
|
Keywords
List View
|
— |
|
|
|
Auto Find and
Add List View
|
— |
— |
— |
— |
Suppressed Words
Editor
|
|
— |
— |
— |
Log |
|
— |
— |
— |
Exporting the Keywords List
Once you're done with adding words, phrases, and references/links to the Keywords List,
you can export it as both a HTML keywords meta tag and/or a
Microsoft Compiled Help .hhk file.
To do either or both,
1. |
Open the File Menu in the Main Menu.
|
2. |
Open the "Export as" menu item's sub-menu
(by moving the cursor over the "Export as Menu Item")
|
3. |
Select/click the Keywords Meta Tag or Compiled Help Index File menu item
to save the keywords either as a keywords meta tag or .chm index file, respectively.
Both menu items will open the Save As Dialog with the appropriate
file type preselected into the Save as Type Drop-Down List.
|
4. |
In the Save As Dialog, use the controls at the top of the dialog to
navigate to the folder in which you want to save the respective file.
|
5. |
Enter the file name under which you would like to save the exported file
in the File Name Combo Box. The File Name Combo Box is located in the lower
half of the dialog between the dialog's File List View (above it) and the
Save as Type Drop-Down List (below it).
Alternatively you can simply retain the file name we've suggested
and which is already entered in the File Name Combo Box.
|
6. |
Close the dialog by means of its Save Button.
|
Saving the Created and/or Processed Files
With the exception of the source file, the page displayed in the Integrated Browser,
and the list on the Auto Find and Add Tab Sheet, the loaded and generated files can be saved
both individually and/or collectively.
Although, unlike the source file and the keywords in the Auto Find and Add List, which cannot
be saved at all, the pages/files in the Integrated Browser can be saved, but have to be saved
individually, on a per page/file basis.
Exported files can also be saved only individually, on a per file basis.
|
Saving the Page(s)/File(s) Displayed in the Integrated Browser
Note, the first three steps only have to be performed if the page
has not already been loaded or you have not visited it during your
current HTMLStripper session.
|
1. |
Select the URL or path and name of the page to save by
entering the full URL in the Source File(s) Combo Box.
|
2. |
Press/click on the Open URL Button in the Button Tool Bar.
|
3. |
Wait for the page to displayed in the Integrated Browser.
|
4. |
Should you have switched to another tab sheet, return to/reopen the Browser Tab Sheet.
|
5. |
Open the File Menu in the Main Menu.
|
6. |
In the File Menu, select/click the "Save as ..." Menu item
to open the Save HTML Document Dialog.
|
7. |
In the Language (or Encoding) Drop-Down List
of the Save HTML Document Dialog, select the character set/language, in which the
the text exposed to users, was authored.
|
8. |
In the Save as Type Drop-Down List, select the type of file as which you save
the source code of the page in the Integrated Browser.
|
9. |
Close the Save HTML Document Dialog by means of its Save Button.
|
|
Saving Files Individually
The files the HTMLStripper generates and/or the modifications you make to the files
on the Stripped Contents, Distinct Words List, Keywords, Suppress(ed) Words, and Log
tab sheets can be saved on a per tab sheet basis by
|
1. |
switching to the tab sheet on which the file you want to save is located,
|
2. |
opening the Main Menu's File Menu, and
|
3. |
selecting/clicking the Save or Save As Menu Item.
|
If you have not previously saved a particular tab sheet's file(s),
the Save Menu Item will automatically open the Save As Dialog, in which you can
select and open the folder in which you want to save the file, specify a file name,
and select the file type, as which you want it to be saved. Otherwise the previously
saved file's contents will be replaced by the current contents.
The Save As Menu Item always opens the
Save As Dialog, irrespective of whether the file has already been saved or not.
|
|
Saving Files Collectively
When processing multiple source files, it would be a nuisance to have to
save the file on each tab sheet individually. Furthermore, doing so would
be error prone, in that saving the contents of a tab sheet could easily be forgotten.
This can be avoided by saving the files collectively.
Saving the files collectively is essentially equivalent to switching to each
tab sheet and saving it individually, only that this performed automatically.
To save the files collectively:
|
1. |
Open the File Menu in the Main Menu.
|
2. |
Select/click the Save All Menu Item.
|
3. |
If the Save As Dialog is opened because a file needs to be saved for the first
time and you do not want to skip saving it, proceed as if you were saving
the file individually. Otherwise (i.e.
if you don't want to save the file), simply close the dialog by means of its
Cancel Button. The HTMLStripper will then continue to save the remaining files (if any).
|
In this, provisional, version of the HTMLStripper, it is only possible
to open/load files into viewers/editors on the following tab sheets
• |
HTML Source File
|
• |
Browser
|
• |
Stripped Contents
|
• |
Keywords
|
• |
Suppress(ed) Words
|
It is not yet possible to load the other files produced by HTMLStripper into the appropriate
viewers/editors on the other tab sheets (e.g. the Distinct Words, Log, etc.).
Furthermore, when loading files into the Stripped Contents and Suppress(ed) Words Editors,
these have to be ANSI encoded files. Only the Source Code Viewer/Editor on the HTML Source Code Tab Sheet
and the Integrated Browser are currently capable of handling and displaying Unicode files as well.
Attempting to load Unicode encoded files into the viewers/editors on the
Stripped Contents and/or Suppress(ed) Words tab sheets will result in the files not being displayed correctly.
Editing and saving these files will also fail and/or may lead to data corruption.
Nonetheless, within the limitations imposed by the current functionality,
files can be loaded into the respective viewers/editors, much like they are saved individually.
That is by,
1. |
switching to the tab sheet on which the viewer/editor is located,
in which the file should be opened.
|
2. |
Opening the File Menu in the Main Menu.
|
3. |
Selecting/clicking the Open Menu Item in the File Menu to
display the Open Dialog.
|
4. |
Selecting the file to open in the Open Dialog's
File List View (located in the middle of the dialog)
or entering its name in the File Name Combo Box,
in the lower half of the dialog, above the File of Type Drop Down List.
|
5. |
Closing the Open Dialog by means of a click on the Open Button.
|
Creating a Compiled Help, .hhk Index File, Step by Step
1. |
Open the following three pages on our website and save each page as a
separate, ideally complete, HTML, page, to a easily accessible location on your computer.
|
2. |
Open the Open File Dialog to select the first source file to process,
by selecting/clicking on the "Select Source File(s) ..." menu item in the
File Menu of the Main Menu.
|
3. |
In the Open File Dialog, select the HTML file that is the first page you have saved in step 1,
by selecting it in the dialog's Files List View.
This should be the file to which you saved the
"ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents" page.
|
4. |
Close the Open File Dialog by means of its Open Button.
|
5. |
Wait for the source code to be displayed on the HTML Source File Tab Sheet.
|
6. |
Switch to the Distinct Words List Tab Sheet.
|
7. |
Process the file by opening the Assignment Menu in the Main Menu and
selecting/clicking the Run Menu Item
|
8. |
Wait for the word lists to be displayed in both panes of the
Distinct Words List Tab Sheet
|
9. |
In the left pane on the Distinct Words List Tab Sheet, select the entire text,
from the word C_ONLINEHELPURL on downward
|
10. |
Open the Edit Menu in the Main Menu.
|
11. |
Select/click the Add Words to Keywords Menu Item.
|
12. |
Switch to the Keywords Tab Sheet to view and/or save the just added keywords.
|
13. |
Save the contents of the Keywords List as described under
Dummy Title
|
14. |
Open the source code of the second page you saved in Step 1
(the source file of the page titled "(SST) ShlWAPIFunctionInfo Classes").
|
15. |
Once it has been fully loaded, press the Run Button in the Button Tool Bar and wait
for the word lists to be displayed on the Distinct Words List Tab Sheet
|
16. |
In the pane on the right-hand side of the Distinct Words List Tab Sheet, select the
words, "Classes", "TAboutBox", and "TForm1".
|
17. |
Add the selected words to the Keywords List, by selecting/clicking on the
"Add Words to Keywords" menu item in the Distinct Words List's Context Menu.
|
18. |
Switch to the Auto Find and Add Tab Sheet.
|
19. |
Open the Assignment Menu in the Main Menu.
|
20. |
Select/Click the Auto Find and Add Menu Item at the bottom of the Assignment Menu.
|
21. |
Open the Auto Find and Add List's context menu through a click of the right mouse button.
|
22. |
Select/click the Add Source File as Link Menu Item.
|
23. |
Switch to the Keywords Tab Sheet to view and save the modifications made to the Keywords List.
In this case, the modifications will be less obvious than the previous two additions,
because the last action will have primarily added information to the list's second and third columns.
|
24. |
Using the same procedure as before,
load the third source file
(the HTML source file of the page with the title "Tools").
|
25. |
Switch to the Stripped Contents Tab Sheet.
|
26. |
Process the source file by either of the two already described methods or
simply by pressing the function key F5 on your keyboard.
|
27. |
In the Stripped Contents Editor, mark/select the phrase "Intel CPU native code"
|
28. |
In the Edit Menu of the Main Menu select/click the Add Selection as Keyword Menu Item.
|
29. |
Position the cursor/caret before the Word Intel in the Keyword Combo Box.
|
30. |
Mark/select the words "Intel CPU".
|
31. |
Open the Keyword Combo Box's context menu through a click of the right mouse button
while the cursor is over the Keyword Combo Box.
|
32. |
In the Keyword Combo Box's context menu, select/click "Cut".
|
33. |
Position the cursor after the word "code" in the Keyword Combo Box
and, by means of the keyboard, add a comma and a blank/space to the end of the
text.
|
34. |
Open the Keyword Combo Box's context menu again and select/click the
Paste Menu Item. This ought to append the two words "Intel CPU"
to the text "native code, ".
|
35. |
Press the Add/Apply Button to the right of the Keyword Combo Box.
|
36. |
Return to the Keywords Tab Sheet.
|
37. |
Open the File Menu in the Main Menu.
|
38. |
Select/click the Compiled Help Index File in the sub-menu of the "Export as" menu item.
|
39. |
In the Save As Dialog that ought to have been opened in the previous step,
select the folder under which you would like to save the Compiled Help, .hhk, index file.
|
40. |
Enter a file name for the file in the Save As Dialog's File Name Combo Box.
|
41. |
Close the dialog by clicking on its Save Button.
|
As the exported file (in spite of its file extension) is a plain text file,
you can open, view, and edit it in any editor.
To acquaint yourself with the format, we recommend doing so.
The HTML code, below, shows what it should look like when opened in one.
However, if it is at your disposal, we also recommend opening it in the
Microsoft (Compiled) Help Workshop.
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<meta name="generator" content="SST HTMLStripper Version 0.3">
</head>
<body>
<ul>
<li><object type="text/sitemap">
<param name="Name" value="C_ONLINEHELPURL">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="C_SHLWAPIDLLNAME">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Classes">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Constants">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Contents">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<code><param name="Name" value="Developer">
<code><param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<code><param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<code><param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<code><param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
<code></object>
<li><object type="text/sitemap">
<param name="Name" value="Functions">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Intel CPU native code">
<param name="Name" value="Tools">
<param name="Local" value="Tools.htm">
</object>
<li><object type="text/sitemap">
<param name="Name" value="IsValidHandle">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="July">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="last">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="mail">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="native code, Intel CPU">
<param name="Name" value="Tools">
<param name="Local" value="Tools.htm">
</object>
<li><object type="text/sitemap">
<param name="Name" value="PSSTWinResLanguageId">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Reference">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="ShlWAPIFunctionInfo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="ShlWAPIFunctionInfoAbout">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="ShlWAPIFunctionInfoMain01">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Software">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Table">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TAboutBox">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TDllGetVersionProc">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TDllVersionInfo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TDllVersionInfo2">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TForm1">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTAdvancedComboBox">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTAdvancedListView">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTAdvancedMemo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTBasicTextSearchOptions">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTCharSetType">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTDllVerInfo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTWinResLanguageId">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Types">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Units">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="updated">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Version">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents"> g
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
</ul>
</body>
<html>
|
Using the HTMLStripper with Unformatted Text Files
Although we generally don't process files other than HTML/XML files, or in other
words formatted texts, the HTMLStripper can also be used to generate word lists
from plain, unformatted texts, such as the "ReadMe.txt" file that is
part of the setup package. Just as with HTML/XML files, it will generate the two
word lists, which can subsequently be used to create a keyword list from these
files. Unfortunately, the word count feature, does not always work entirely reliable,
when used with such files if they also contain HTML/XML tags (i.e. if the file is a
sort of scratch pad type file).
Using the HTMLStripper for Proofreading
Although, we already had integrated a spell-checker into the HTMLStripper, we
were forced to remove it because it only supported a single language and it was
clear that further languages would not be made available. But, we have not given
up the idea and are looking into alternative solutions.
However, even without an integrated spell-checker The HTMLStripper has two
features that simplify proofreading considerably.
1. |
The Stripped Contents Editor |
2. |
The word lists on the Distinct Words List Tab Sheet. |
Because, all hidden, on-demand/triggered, texts, such as the cookie settings,
are exposed and distracting elements, such as background graphics and
advertising images were removed, detecting grammatical and/or usage errors on a
HTML page is easier in the Stripped Contents Editor. Typographical errors, on
the other hand, can be identified more easily in the alphabetically sorted word
lists on the Distinct Words Tab Sheet. The following, two scenarios exemplify
the issue.
a. |
Scenario 1 (Context Required) |
Consider the following two sentences: |
Eve gave Adam an appeal from the tree of forbidden fruit. To prevent being
banned from paradise, Adam requested a hearing before the court of apples.
|
Unless the sentences are from a satirical or nonsensical version of the bible
the words "appeal" and "apples" were either misspelled or used in
the wrong context, in spite of having been spelled correctly when examined individually
(i.e. out of context).
|
As these types of errors may require the entire page's text, they can be
identified far more easily in the Stripped Contents Editor than in the word
lists on the Distinct Words Tab Sheet.
|
b. |
Scenario 2 (Simple "Typos") |
Consider the following, abridged, list of words |
accompanying
accounts
actual
acutal
all
also
...
Prerequisites
preserving
Previous
properies
Properties
purposes
...
|
Even by going over this list only once, you will have probably noticed the two,
typographical errors. They stick out pretty distinctly, because a word seems to
occur twice. In fact, this example is from lists generated from two of our
website's pages.
|
Summary |
Both features can be used individually or in combination with one another, but
still require a human to detect errors in the text or words.
|
We have adopted the following proofreading method for our pages' texts.
|
1. |
Going down the list of words on the Distinct Words Tab Sheet and if we detect
any "unusual" words and/or spellings,
|
2. |
reading either the relevant parts very closely or reviewing the page's entire
text in both, the Stripped Contents Editor and/or on the correctly displayed
page in a browser.
|
Footnotes
*1
|
Depending on the Windows version and edition, the caption that precedes the
drop-down list can vary.
Whereas it might read "Language" under one Windows version,
it may read "Encoding" under another.
Furthermore, the captions we have used to name the drop-down list are those
of English, Windows, editions. However, all captions on the Save HTML Document Dialog
are always those of the primary, user intercace, language, under which the HTMLStripper
is ruuning.
|
|
|