Introduction

Ag is a fast and friendly anagram generator. These notes describe how to use the command-line version of Ag, known as "agc". The examples below assume you've installed the agc binary in a suitable location so you can run it by simply typing agc in a terminal window. If you type agc by itself you'll get this brief help:

   Usage: agc [options] word or phrase to be anagrammed [options]
   Options:
   -a N            print at most N anagrams (default is unlimited)
   -c word         print anagrams containing given word
   -h              print this help information
   -i              print anagrams with increasing word lengths
   -l lexicon      use given lexicon file (default is Words)
   -m pattern      only print lexicon/usable words that match pattern
   -n N            print N usable words per line (default is 10)
   -o newlexicon   save current lexicon in given file
   -p              only print words in lexicon
   -t textfile     use given text file (UTF-8 encoded) as lexicon
   -u              only print usable words, by increasing length
   -ua             only print usable words, in alphabetical order
   -U              print all words in UPPERCASE
   -w MIN,MAX      minumum and maximum words in anagrams (default is 1,10)

Some not-so-obvious tips

  • If you type a command without using the -l or -t options then agc will look for a lexicon file called Words in the current directory. If it can't find Words it will look for Lexicons/Words.

  • Any uppercase letters in the text to be anagrammed are automatically converted to lowercase. Spaces are ignored, as is most punctuation (.,:;'"). These commands are equivalent:
       agc "andrew trevorrow"
       agc Andrew Trevorrow.
       agc andrewtrevorrow
    
  • The order in which you type in options and the text doesn't matter. Spaces between options and their values are optional. The following examples are equivalent:
       agc -a 3 andrew trevorrow
       agc andrew -a3 trevorrow
       agc andrewtrevorrow -a3
    
  • If the -w option is given a single number then MIN equals MAX. This example will only generate one-word anagrams:
       agc andrew -w1
    
  • No anagrams are generated if any of these options are used:
       -m, -n, -o, -p, -u, -ua
    
  • The following command saves all words in the default lexicon to a text file:
       agc -p > words.txt
    
    The words will be in alphabetical order, one word per line. The last line will show the total number of words.

  • Ag supports many non-English languages. For example:
       agc -l French œuvré
    
    If the supplied text has any non-ASCII letters (as in the above example) then they must be UTF-8 encoded, so you might need to change your shell's settings. If using the Mac's Terminal app, open the Preferences dialog, go to Settings > Advanced and make sure the character encoding is set to "Unicode (UTF-8)".

How to find good anagrams

There are plenty of programs that can generate anagrams from a given text. Ag tries to make it easier to find interesting anagrams. It does this by splitting the process into two steps:

  1. Find the usable words. These are all the words from the current lexicon that can be made out of the letters in the supplied text. The -u or -ua options will print out the usable words (no anagrams will be generated):
       agc andrewtrevorrow -u     (prints usable words by increasing length)
       agc andrewtrevorrow -ua    (prints usable words in alphabetical order)
    
    For a long piece of text there might be thousands of usable words, so you might want to use the ‑m option to find only the words that match a certain pattern. This example will only print words starting with "ov":
       agc andrewtrevorrow -u -m"ov*"
    
    The following section has all the details about pattern matching.

  2. Select one or more usable words that look interesting and then generate anagrams containing those words by using the -c option:
       agc andrewtrevorrow -c overt 
       agc andrewtrevorrow -c overt -c word
    
    The 2nd example will only generate anagrams containing both words.

Pattern matching

The -m option can be used to print out only the lexicon words or usable words that match a given pattern. Let's look at some simple examples:

   agc andrew -u -m"*a*"    (print usable words containing the letter "a")
   agc andrew -ua -m"*a*"   (ditto, but print the words alphabetically)
Note that you don't need to specify -u if you use -m and supply some text to be anagrammed. In that case it's assumed you want to match usable words:
   agc andrew -m"re*"       (print usable words starting with "re")
   agc andrew -m"\!re*"     (print usable words that don't start with "re")
   agc andrew -m!"re*"      (ditto)
   agc andrew -m"*re"       (print usable words ending with "re")
   agc andrew -m"???"       (print usable words with 3 letters)
Similarly, you don't need to specify -p if you use -m without supplying any anagram text. In that case it's assumed you want to match lexicon words. These examples all match lexicon words:
   agc foo -p -m"?9"        (print lexicon words with 9 letters)
   agc -p -m"?7-9"          (print lexicon words with 7 to 9 letters)
   agc -p -m"?7-"           (print lexicon words with at least 7 letters)
   agc -p -m"?-7"           (print lexicon words with at most 7 letters)
   agc -m"*[xyz]"           (print lexicon words ending in "x" or "y" or "z")
   agc -m"*(ab|xy)*"        (print lexicon words containing "ab" or "xy")
   agc -m"[\!aeiou]-"       (print lexicon words with no vowels)
   agc -m[!"aeiou]-"        (ditto)
Note that it's usually best to enclose pattern strings in double quotes. This is necessary because most of Ag's special pattern characters also have a special meaning to the shell. The exclamation mark (!) can be tricky. If inside double quotes you might have to type "\!" rather than "!", or else put it outside the double quotes (the above examples show both methods). Here are Ag's special pattern characters:
   *        Match zero or more letters.

   ?        Match any single letter.

   [...]    Match any letter in the given list; eg. [abc].
   [!...]   Match any letter NOT in the given list; eg. [!aeiou].

   N        Specifies a fixed repeat count (N is a non-negative integer).
            Repeat counts are only allowed after ?, ], or a letter; eg. ?9.
            
   M-N      Specifies a variable repeat count, where M and N are optional
            non-negative integers indicating the minimum and maximum counts.
            If M is missing then 0 is assumed, and if N is missing then
            infinity is assumed.  Note that * is equivalent to ?-.

   -        Used inside [...] to indicate a letter range; eg. [a-z],
            or to separate min and max repeat counts; eg. ?2-5.

   (...)    Match a sub-pattern; eg. *(ed|ing).

   |        Means OR.  For matching alternative patterns; eg. a*|b*.

   !        Means NOT.  Can only be the first character in the pattern
            or the first character after [.

Lexicon files

A lexicon file is a binary file containing a list of words in a special format that allows the file to be loaded into memory very quickly. The format also allows specific words to be found very quickly.

All words in a lexicon consist of 1 to 30 lowercase letters, possibly including some non-ASCII letters. The full set of valid letters is:

   a to z and áàâäãåçéèêëíìîïñóòôöõúùûüßæøœÿı
Technical note: Non-ASCII letters are stored in the MacRoman encoding. This allows lexicons to support many non-English languages (French, German, Italian, Spanish, etc) but retain the simplicity of one byte per letter. Although Ag uses the MacRoman encoding internally, when it prints out words it uses the UTF-8 encoding.

Note for Mac users: Lexicon files are in the same format as the "word list" files used by my defunct Anagrams app, so there's no problem using those word lists with Ag.

Using a text file as the lexicon

The -t option allows any text file (in UTF-8 encoding) to be used as the lexicon:

   agc -t foo.txt -p        (prints the unique words in foo.txt)
   agc -t foo.txt andrew    (generates anagrams using the words in foo.txt)
All of the unique words in the given file will be extracted, but only if they are "valid" words. A valid word is a contiguous sequence of 1 to 30 lowercase letters (see the previous section for the set of valid letters) delimited by any of these characters:
   NUL to space (this includes TAB, CR, LF) and !"(),.:;?¿¡«»… —“”
The 4th last character is a non-breaking space.

The -o option can be used to save the current lexicon in a new lexicon file. You'll get an error message if you try to overwrite an existing file. Some examples:

   agc -t foo.txt -o Foo    (creates a lexicon file with the words in foo.txt)
   agc -o NewWords          (copies the default lexicon file)

Well done if you managed to read this far. Now go have some fun!

Author: Andrew Trevorrow (andrew@trevorrow.com) aka Overt Word Warren.