Usage notes for agc

Ag is a fast, scriptable anagram generator. These notes describe how to use the command-line version of Ag, known as "agc". The examples given below assume you've installed the agc binary in a suitable location so you can run it by simply typing agc in a terminal window.

Usage

Some not-so-obvious tips

How to find good anagrams

Pattern matching

Lexicon files

Using a text file as the lexicon

Numeric lexicons

Scripting functions

String arrays

Usage

If you type agc by itself you'll get version and usage information:

    This is agc version 2.0 (with Lua version 5.4.8).
    Usage: agc [options] word or phrase to be anagrammed [options]
    Options:
    -a N            print at most N anagrams (default is unlimited)
    -c word         print anagrams containing the given word
    -h              print this help information
    -i              print anagrams with increasing word lengths
    -l lexicon      use the given lexicon file (default is Words.lex)
    -m pattern      only print lexicon/usable words that match pattern
    -n N            print N usable words per line (default is 10)
    -o newlexicon   save current lexicon in the given file
    -p              only print words in lexicon
    -r script       run the given Lua script
    -t textfile     use the given text file (UTF-8 encoded) as lexicon
    -u              only print usable words, by increasing length
    -ua             only print usable words, in alphabetical order
    -U              print all words in UPPERCASE
    -w MIN,MAX      minimum and maximum words in anagrams (default is 1,10)

Some not-so-obvious tips

If you type a command without using the -l or -t options then agc will look for a lexicon file called Words.lex in the current directory. If it can't find Words.lex it will look for Lexicons/Words.lex.
Any uppercase letters in the text to be anagrammed are automatically converted to lowercase. Spaces are ignored, as is most punctuation (.,:;'"). These commands are equivalent:
```
    agc "andrew trevorrow"
    agc Andrew Trevorrow.
    agc andrewtrevorrow
```
The order in which you type in options and the text doesn't matter. Spaces between options and their values are optional. The following examples are equivalent:
```
    agc -a 3 andrew trevorrow
    agc andrew -a3 trevorrow
    agc andrewtrevorrow -a3
```
If the -w option is given a single number then MIN equals MAX. This example will only generate one-word anagrams:
```
    agc andrew -w1
```
No anagrams are generated if any of these options are used:
```
    -m, -n, -o, -p, -r, -u, -ua
```
The following command saves all words in the default lexicon to a text file:
```
    agc -p > words.txt
```
The words will be in alphabetical order, one word per line. The last line will show the total number of words.
Ag supports many non-English languages. For example:
```
    agc -l French.lex œuvré
```
If the supplied text has any non-ASCII letters (as in the above example) then they must be UTF-8 encoded, so you might need to change your shell's settings. If using the Mac's Terminal app, open the Preferences dialog, go to Settings > Advanced and make sure the character encoding is set to "Unicode (UTF-8)". On Windows you'll need to enter the command "chcp 65001".

How to find good anagrams

There are plenty of programs that can generate anagrams from a given text. Ag tries to make it easier to find interesting anagrams. It does this by splitting the process into two steps:

Find the usable words. These are all the words from the current lexicon that can be made out of the letters in the supplied text. The -u or -ua options will print out the usable words (no anagrams will be generated):
```
    agc andrewtrevorrow -u     (prints usable words by increasing length)
    agc andrewtrevorrow -ua    (prints usable words in alphabetical order)
```
For a long piece of text there might be thousands of usable words, so you might want to use the ‑m option to find only the words that match a certain pattern. This example will only print words starting with "ov":
```
    agc andrewtrevorrow -u -m"ov*"
```
The following section has all the details about pattern matching.
Select one or more usable words that look interesting and then generate anagrams containing those words by using the -c option:
```
    agc andrewtrevorrow -c overt 
    agc andrewtrevorrow -c overt -c word
```
The 2nd example will only generate anagrams containing both words.

Pattern matching

The -m option can be used to print out only the lexicon words or usable words that match a given pattern. Let's look at some simple examples:

    agc andrew -u -m"*a*"    (print usable words containing the letter "a")
    agc andrew -ua -m"*a*"   (ditto, but print the words alphabetically)

You don't need to specify -u if you use -m and supply some text to be anagrammed. In that case it's assumed you want to match usable words:

    agc andrew -m"re*"       (print usable words starting with "re")
    agc andrew -m"~re*"      (print usable words that don't start with "re")
    agc andrew -m"*re"       (print usable words ending with "re")
    agc andrew -m"???"       (print usable words with 3 letters)

Similarly, you don't need to specify -p if you use -m without supplying any anagram text. In that case it's assumed you want to match lexicon words:

    agc -m"?9"               (print lexicon words with 9 letters)
    agc -m"?7-9"             (print lexicon words with 7 to 9 letters)
    agc -m"?7-"              (print lexicon words with at least 7 letters)
    agc -m"?-7"              (print lexicon words with at most 7 letters)
    agc -m"*[xyz]"           (print lexicon words ending in "x" or "y" or "z")
    agc -m"*x*&*y*"          (print lexicon words containing "x" and "y")
    agc -m"*(ab|xy)*"        (print lexicon words containing "ab" or "xy")
    agc -m"[~aeiou]-"        (print lexicon words with no vowels)

Note that it's usually best to enclose pattern strings in double quotes. This is because some of Ag's special pattern characters also have a special meaning to the shell. Here are Ag's special characters:

    *        Match zero or more letters.

    ?        Match any single letter.

    [...]    Match any letter in the given list; eg. [abc].
    [~...]   Match any letter NOT in the given list; eg. [~aeiou].

    N        Specifies a fixed repeat count (N is a non-negative integer).
             Repeat counts are only allowed after ?, ], or a letter; eg. ?9.
        
    M-N      Specifies a variable repeat count, where M and N are optional
             non-negative integers indicating the minimum and maximum counts.
             If M is missing then 0 is assumed, and if N is missing then
             infinity is assumed.  Note that * is equivalent to ?-.

    <...>    Any repeat count can be enclosed in angle brackets; eg. <2-5>.
             This form of a repeat count is necessary when using a numeric
             lexicon (see below).

    -        Used inside [...] to indicate a letter range; eg. [a-z],
             or to separate min and max repeat counts; eg. ?2-5.

    (...)    Match a sub-pattern; eg. *(ed|ing).

    |        Means OR.  For matching alternative patterns; eg. a*|b*.

    &        Means AND.  For matching combined patterns; eg. *a*&*b*.

    ~        Means NOT.  Can only be the first character in the pattern
             or the first character after ( or [.

    !        Also means NOT but best avoided when using agc.

Lexicon files

A lexicon file is a binary file containing a list of words in a special format that allows the file to be loaded into memory very quickly. The format also allows specific words to be found very quickly. All lexicon files supplied with Ag have a .lex extension. It's not strictly required, but it's a good idea to stick to that convention when naming your own lexicon files.

All words in a lexicon consist of 1 to 30 lowercase letters, possibly including some non-ASCII letters. The full set of valid letters is:

    a to z and áàâäãåçéèêëíìîïñóòôöõúùûüßæøœÿı

Technical note: Non-ASCII letters are stored in the MacRoman encoding. This allows lexicons to support many non-English languages (French, German, Italian, Spanish, etc) but retain the simplicity of one byte per letter. Although Ag uses the MacRoman encoding internally, when it prints out words it uses the UTF-8 encoding.

Note for Mac users: Lexicon files are in the same format as the "word list" files used by my defunct Anagrams app, so there's no problem using those word lists with Ag.

Using a text file as the lexicon

The -t option allows any text file (in UTF-8 encoding) to be used as the lexicon:

    agc -t foo.txt -p        (prints the unique words in foo.txt)
    agc -t foo.txt andrew    (generates anagrams using the words in foo.txt)

All of the unique words in the given file will be extracted, but only if they are "valid" words. A valid word is a contiguous sequence of 1 to 30 lowercase letters (see the previous section for the set of valid letters) delimited by any of these characters:

    NUL to space (this includes TAB, CR, LF) and !"(),.:;?¿¡«»… —“”

The 4th last character is a non-breaking space.

The -o option can be used to save the current lexicon in a new lexicon file. You'll get an error message if you try to overwrite an existing file. Some examples:

    agc -t foo.txt -o Foo.lex (creates a lexicon file with the words in foo.txt)
    agc -o NewWords.lex       (copies the default lexicon file)

Numeric lexicons

A numeric lexicon contains "words" consisting entirely of digits (0 to 9) rather than letters. Creating a numeric lexicon is quite simple. Just use the -t option to load a text file that begins with a digit. Ag will then extract all the unique numbers (ie. digit strings) in the file and build a numeric lexicon. You can add the -o option to save the results in a .lex file.

There is one potential source of confusion when entering patterns to match numbers in a numeric lexicon. If you want to specify a repeat count then you must surround it with angle brackets. This avoids any ambiguity, as should be clear from the following examples:

    agc -l Primes.lex -m"?3"    (print all 2-digit primes ending with 3)
    agc -l Primes.lex -m"?<3>"  (print all 3-digit primes)

Note that angle brackets can also be used in patterns for non-numeric lexicons, so you might prefer to use them all the time.

Scripting functions

This section describes all the ag.* functions that can be used in a script run by agc.

ag.adjacent
ag.anagrams
ag.check
ag.clear
ag.exit
ag.getchars

ag.getdir
ag.getstring
ag.gui
ag.help
ag.lexiconpath
ag.lexiconwords

ag.loadlexicon
ag.lower
ag.note
ag.numchars
ag.numeric
ag.savelexicon

ag.setoption
ag.upper
ag.usablewords
ag.wordfound

ag.adjacent(word1, word2)

Returns true if the given lexicon words differ by only 1 letter. It assumes the words have the same number of letters (note that they might not have the same number of bytes if they contain non-ASCII letters). This function is used to speed up the creation of adjacency graphs in Scripts/ladders.lua.

Example: if ag.adjacent("dog","dig") then print("adjacent") end

ag.anagrams(text)

Returns an array of anagrams for the given text. Each anagram is a single string, where multiple words are separated by spaces. See the description of ag.setoption for how to change various options used by this function.

Example: local a = ag.anagrams("andrewtrevorrow")

ag.check()

This function does nothing in agc. (In Ag it is used to allow a long-running script to be stopped.)

ag.clear(n)

Clears the last n lines, or the entire terminal window if no argument is supplied.

Example: if numlines > 1000 then ag.clear() end

ag.exit(message)

Exits the script with an optional error message. If a message is supplied then it will be printed. If no message is supplied then nothing will be printed.

Example: if i < 0 then ag.exit("Oops!") end

ag.getchars(text)

Returns an array containing each character (possibly non-ASCII) in the given text.

Example: local chars = ag.getchars(word)

ag.getdir(dirname)

Ignores the given string and returns ".\" on Windows or "./" on Mac/Linux. (This function is best used in scripts run by Ag.)

ag.getstring(prompt, default)

Prints the given prompt, optionally followed by "[default = default]" and "> " on two extra lines if a non-empty default string is supplied. It then waits for user input and returns the entered string, or the default string if the user just hits the enter/return key.

Example: local n = tonumber(ag.getstring("Enter a number:","100"))

ag.gui()

Returns true if Ag is running the script, or false if agc is running the script.

Example: if not ag.gui() then print("this is agc") end

ag.help(htmlfile)

Prints the given file path prefixed by "Open this HTML file: ".

Example: ag.help("results.html")

ag.lexiconpath()

Returns the path given to the -l option, or "Words.lex" if that option is not used.

Example: print("Current lexicon is", ag.lexiconpath())

ag.lexiconwords(pattern="*")

Returns an array of words from the current lexicon that match the given pattern. If no pattern is supplied then the array contains all the words in the lexicon. See the description of ag.setoption for how to change a couple of options used by this function.

Example: local zwords = ag.lexiconwords("z*")

ag.loadlexicon(filepath)

Loads the given file and sets the current lexicon used by future calls of ag.lexiconwords, ag.anagrams and ag.usablewords. A non-absolute path is relative to the current working directory. If the given file is not an Ag lexicon file then it's assumed to be a text file and the unique words or numbers will be extracted, as described above.

Example: ag.loadlexicon("Lexicons/Spanish.lex")

ag.lower(string)

Returns the lowercase version of the given string. Unlike Lua's string.lower function, ag.lower will correctly convert any uppercase non-ASCII letters in the string, as long as they are valid lexicon letters.

Example: print(ag.lower("ÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÆØŒŸ"))

ag.note(message)

Prints the given message prefixed by "NOTE: ".

Example: ag.note("hello")

ag.numchars(text)

Returns the number of characters (possibly non-ASCII) in the given text.

Example: if ag.numchars(word) == 3 then print(word) end

ag.numeric()

Returns true if the current lexicon is a numeric lexicon.

Example: if ag.numeric() then print("lexicon is numeric") end

ag.savelexicon(filepath)

Saves the current lexicon in the given file. A non-absolute path is relative to the current working directory.

Example: ag.savelexicon("mylexicon.lex")

ag.setoption(name, value)

Sets various options that change the behavior of later ag.* calls. Here are all the option names, their possible values, and the ag.* functions they affect:

`"minlength"`		minimum word length (1 to 30)
		used by ag.anagrams, ag.usablewords

`"maxlength"`		maximum word length (1 to 30)
		used by ag.anagrams, ag.usablewords

`"minwords"`		minimum words in an anagram (1 to 50)
		used by ag.anagrams

`"maxwords"`		maximum words in an anagram (1 to 50)
		used by ag.anagrams

`"maxanagrams"`		maximum number of anagrams (0 = no limit)
		used by ag.anagrams

`"increase"`		anagram words increase in length (true or false)
		used by ag.anagrams

`"alphabetic"`		return words in alphabetical order (true or false)
		used by ag.lexiconwords, ag.usablewords

`"uppercase"`		return words in uppercase (true or false)
		used by ag.anagrams, ag.lexiconwords, ag.usablewords

Example: ag.setoption("maxanagrams", 1000)

ag.upper(string)

Returns the uppercase version of the given string. Unlike Lua's string.upper function, ag.upper will correctly convert any lowercase non-ASCII letters in the string, as long as they are valid lexicon letters.

Example: print(ag.upper("ááâäãåçéèêëíìîïñóòôöõúùûüæøœÿ"))

ag.usablewords(text)

Returns an array of all the words that can be made from the letters in the given text. See the description of ag.setoption for how to change various options used by this function.

Example: local u = ag.usablewords("retsina")

ag.wordfound(word)

Returns true if the given word exists in the current lexicon.

Example: if ag.wordfound("cat") then patcat() end

String arrays

A number of ag.* functions return arrays containing strings. These are standard Lua arrays — that is, tables indexed by integers where the first string has index 1. All the strings use the UTF-8 encoding and might contain non-ASCII characters, so to count the characters in a string you should avoid using the standard Lua length operator "#" as it returns the number of bytes. Instead, use ag.numchars to return the number of characters:

    if ag.numchars(word) > 2 then print(word) end

And to extract each character in a UTF-8 string you can use ag.getchars:

    for _, ch in ipairs(ag.getchars(word)) do print(ch) end

Note that the Lua functions string.lower and string.upper don't convert non-ASCII characters correctly, so use ag.lower and ag.upper instead.

* * * * *

Well done if you managed to read this far. Now go have some fun!

Author: Andrew Trevorrow (andrew@trevorrow.com) aka Overt Word Warren.