The "GetText" utility of Kryloff Technologies, Inc.
http://www.kryltech.com

 
1. General information

"GetText.exe" is a free for your personal use console file-to-text conversion utility, which extracts textual content from HTML, MS Office®, RTF, PDF®, HLP and other documents, and saves it into text files. To perform text conversion, GetText uses KT Text Filters.

Important: Text Filters are provided as free components for Kryloff's products only. If you wish to use them in your own software applications, you should purchase a corresponding license. Apart from the right to use KT Text Filters in and distribute them along with your software products world-wide any royalty-free, upon purchasing a license you will be provided with:

  • enhanced versions of the filters along with full documentation disclosing additional capabilities that are not made publicly available (such as, the possibility to provide Unicode output, which has been excluded from GetText; proper selection of an applicable filter DLL, and some others);
  • sample code in C++, C#, Delphi for Win32 and .NET, Visual Basic for Windows and for .NET demonstrating the use of KT Text Filters for memory-to-memory, memory-to-file, file-to-memory, and file-to-file filtering;
  • also, you will be enrolled into the Kryloff Technologies technical support.

    If you have not yet obtained a license from Kryloff Technologies, you may not distribute any of the filters nor you may count on our technical support. See also: KT Text Filters End-User License Agreement.

    KT Text Filters are also used in the rest of the Kryloff Subject Search™ family of products, namely:

      Subject Search Spider your personal information retrieval intelligent Web engine.
      Subject Search Scanner scans files on local and network drives looking for a given phrase.
      Subject Search Siter investigates Web sites looking for a phrase and finds information buried in them.
      Subject Search Pad opens documents in text mode and locates files with similar contents.
      Subject Search Summarizer creates brief summaries of and translates documents or Web pages you are reading.
      Subject Search Sleuth searches in huge collections of files on PC and LAN; includes API for developers.
      Subject Search Server lets visitors search your Web site for the information they are looking for.


    2. Calling GetText

    2.1. GetText.exe Free Edition accepts the following command-line parameters: Source Document, Destination Text File, and optionally, Text Filter DLL File Name. To extract textual contents from a single document, use the following command either under the Windows Command Prompt or in a batch file:
    GetText.exe "Full or relative path to Source Document" "Full or relative path to Destination Text File" or
    GetText.exe "Source Document" "Destination Text File" "KT Filter DLL File Name"
    Enclose command-line parameters into the quotation marks (") if they contain one or more spaces.

    If the optional parameter Text Filter DLL File Name is not specified, GetText scans the "Filters" subfolder of its root folder and locates an appropriate Text Filter in the following order: first, DLL's which do not contain "98" as part of the file name, after which the rest of the files; this order remains the same regardless of the platform. The required filter is selected solely upon the extension of the file being filtered without reading its contents. For example, to convert "MyFile.doc", GetText selects "DOCDLL.dll", etc.

    2.2. You may call GetText.exe directly for example, by selecting the Windows menu items "Start", then "Run", after which typing in the full path to GetText.exe following by corresponding command-line parameters. To extract text from several documents at a time, you may also use it in batch (.BAT) files.

    2.3. Examples:
    a) to obtain textual contents of "c:\My Documents\My File.htm" and save it into the file "c:\My Documents\My Filtered File.txt", issue the following command (it is assumed further that you have placed "GetText.exe" into the folder "c:\Kryloff"):
    c:\Kryloff\GetText.exe "c:\My Documents\My File.htm" "c:\My Documents\My Filtered File.txt"

    b) If you want GetText to apply a particular KT Text Filter, specify the third parameter. For example:
    c:\Kryloff\GetText.exe "c:\My Documents\My File.htm" "c:\My Documents\My Filtered File.txt" HTMDLL.dll

    c) to filter several documents with one command, first, create a batch file (file with the ".BAT" extension) using any text editor. For example, this one:
    c:\Kryloff\GetText.exe "c:\My Documents\File1.htm" "c:\My Documents\File1.htm.txt" HTML98ME.dll
    c:\Kryloff\GetText.exe "c:\My Documents\File2.doc" "c:\My Documents\File2.doc.txt"
    ...
    c:\Kryloff\GetText.exe "c:\My Documents\File100.xls" "c:\My Documents\File100.xls.txt" XLSDLL.dll
    After composing a batch file, just execute it: for example, double-click its icon in the Windows Explorer. Once you do it, all documents mentioned in the batch file will be processed, and corresponding textual (.txt) files will be created. You may compose even more complex batch files; for example, you may check the GetText.exe Exit Code by including the "IF ERRORLEVEL" statements after each call to GetText -- the utility terminates with a non-zero exit code when it fails to filter a particular file due to some reason (which GetText displays as well).

    GetText.exe displays this document only when you call it without command-line parameters; to prevent the appearance of this window, call the utility as specified above in this section.


    3. Filtering components included in the original shipment

    Kryloff Technologies supplies GetText with the following filters:

    Filter file nameDescriptionMinimum platforms requiredFile extensions supported
    HLP2TXT.dllconverts MS Help (.HLP) files into plain textWindows 95 and laterHLP
    HTM2TXT.dll*converts HTML files into textWindows 95 and laterHTM, HTML, HTW, ASCX, ASP, ASPX, HHC, HTX, ODC, STM, XML
    XML.dllconverts XML files into textWindows 98 and laterXML, XSL
    PDF2TXT.dllconverts the Adobe PDF® files into textWindows 95 and laterPDF
    PPTDLL.dllconverts MS PowerPoint® (.PPT) presentations into textWindows 2000, XP and laterPPT, POT, PPS
    RTF2TXT.dll*extracts text from RTF (Rich Text) filesWindows 95 and laterRTF
    UNCD2TXT.dll*converts Unicode TXT files into plain text onesWindows 95 and laterTXT, LST, INI, LOG, CSS, INF, SCP, SCT, WSC, WTX, ZAP
    WPD2TXT.dllextracts plain text from WPD (Word Perfect®) filesWindows 95 and laterWPD
    XLSDLL.dllextracts text from MS Excel® (.XLS) spreadsheetsWindows 2000, XP and laterXLS, XLB, XLC, XLT
    DOCDLL.dll**converts MS Word® (.DOC) documents into textWindows 2000, XP and laterDOC, DOT
    *     Recommended for use under Windows 2000, XP, 2003, Vista, and higher editions.
    ** GetText.exe skips files which have the .DOC extension but are actually RTF ones. A more advanced procedure of selecting an appropriate KT Text Filter will be given to you upon purchasing a license to use KT Text Filters.

    GetText selects a required filter solely upon the extension of the file being filtered without checking its contents. For example, to convert "MyFile.doc", applied is "DOCDLL.dll", etc. If the filter which performs the required type of conversion is not found in the "Filters" subfolder, the utility copies the source file into the destination text file without any changes. If you execute GetText under an obsolete platform in which some of the filters do not function (for example, when you run GetText under Windows 98 and instruct it to process a MS Word® file), the utility produces a sound warning you of the necessity to execute it under a higher version of MS Windows®.

    Some filters create .INI-files in the original filter folders, with the original filter file names but the ".INI" extension. This happens when GetText.exe calls such filters at least once. You may look into corresponding .INI-files and adjust filter settings as necessary to match your particular requirements. For example, the filter "XML.dll" provides an option whether it should extract the entire content of XML files being filtered (and, convert it into an appropriate code page if necessary) or, extract text only from XML tags bypassing the tags, themselves.

    Important: As it has been mentioned above, GetText.exe builds single-byte (ANSI) or DBCS text files only (depending on the currently selected platform code page). The ability to generate text files in Unicode comes to you with purchasing a license to use and distribute KT Text Filters.


    4. Using, distributing, and purchasing KT Text Filters

    If you have obtained the GetText utility or/and KT Text Filters from the Kryloff Technologies Web site or other sources, and have not purchased a license to use the filters yet, you may use GetText and KT Text Filters personally on one computer on the royalty-free basis (as long as you need). To redistribute or reproduce any components of the software, either in part or in whole, you must purchase a license. Any reproduction or redistribution of GetText, KT Text Filters, supplementary files and documentation not in accordance with the KT Text Filters End-User License Agreement is expressly prohibited by law, and may result in severe civil and criminal penalties.

    Additional information about your right to use and (re-)distribute GetText and KT Text Filters is provided in the KT Text Filters End-User License Agreement, Basic License. Should you wish to obtain the source code of KT Text Filters (for example, to use them under operating systems other than Windows, etc.), have any other questions or concerns regarding GetText or KT Text Filters, contact Kryloff Technologies at http://www.kryltech.com/feedback.htm


    5. System requirements

  • Windows 95/98/ME or Windows NT 4.0/2000/XP/2003/Vista (painted in bold are platforms under which the entire functionality of the utility becomes available);
  • Copyright © Kryloff Technologies, Inc. http://www.kryltech.com

    KT Text filters™, Subject Search™, Subject Search Spider (SSSpider™), Subject Search Scanner (SSScanner™), Subject Search Siter (SSSiter™), Subject Search Pad™ (SSPad™), Subject Search Summarizer (SSSummarizer™), Subject Search Sleuth (SSSleuth™), Subject Search Server (SSServer™), and Subject Search Suite™ (SSSuite™) are trademarks of Kryloff Technologies, Inc. Other products or companies mentioned in this document are copyright and/or trademarks of the respective companies.