Ag is a fast, scriptable anagram generator. These notes describe how to use the
command-line version of Ag, known as "agc". The examples given below assume you've
installed the agc binary in a suitable location so you can run it by simply typing
agc in a terminal window.
Usage
Some not-so-obvious tips
How to find good anagrams
Pattern matching
Lexicon files
Using a text file as the lexicon
Numeric lexicons
Scripting functions
String arrays
Usage
If you type agc by itself you'll get version and usage information:
This is agc version 1.8 (with Lua version 5.4.7).
Usage: agc [options] word or phrase to be anagrammed [options]
Options:
-a N print at most N anagrams (default is unlimited)
-c word print anagrams containing the given word
-h print this help information
-i print anagrams with increasing word lengths
-l lexicon use the given lexicon file (default is Words.lex)
-m pattern only print lexicon/usable words that match pattern
-n N print N usable words per line (default is 10)
-o newlexicon save current lexicon in the given file
-p only print words in lexicon
-r script run the given Lua script
-t textfile use the given text file (UTF-8 encoded) as lexicon
-u only print usable words, by increasing length
-ua only print usable words, in alphabetical order
-U print all words in UPPERCASE
-w MIN,MAX minimum and maximum words in anagrams (default is 1,10)
Some not-so-obvious tips
-
If you type a command without using the -l or -t options then agc will look
for a lexicon file called Words.lex in the current directory.
If it can't find Words.lex it will look for Lexicons/Words.lex.
-
Any uppercase letters in the text to be anagrammed are automatically converted
to lowercase. Spaces are ignored, as is most punctuation (.,:;'").
These commands are equivalent:
agc "andrew trevorrow"
agc Andrew Trevorrow.
agc andrewtrevorrow
-
The order in which you type in options and the text doesn't matter.
Spaces between options and their values are optional.
The following examples are equivalent:
agc -a 3 andrew trevorrow
agc andrew -a3 trevorrow
agc andrewtrevorrow -a3
-
If the -w option is given a single number then MIN equals MAX.
This example will only generate one-word anagrams:
agc andrew -w1
- No anagrams are generated if any of these options are used:
-m, -n, -o, -p, -r, -u, -ua
-
The following command saves all words in the default lexicon to a text file:
agc -p > words.txt
The words will be in alphabetical order, one word per line.
The last line will show the total number of words.
-
Ag supports many non-English languages. For example:
agc -l French.lex œuvré
If the supplied text has any non-ASCII letters (as in the above example)
then they must be UTF-8 encoded, so you might need to change your shell's
settings. If using the Mac's Terminal app, open the Preferences dialog,
go to Settings > Advanced and make sure the character encoding is set to
"Unicode (UTF-8)". On Windows you'll need to enter the command "chcp 65001".
How to find good anagrams
There are plenty of programs that can generate anagrams from a given text.
Ag tries to make it easier to find interesting anagrams. It does this by
splitting the process into two steps:
-
Find the usable words. These are all the words from the current lexicon
that can be made out of the letters in the supplied text. The -u or -ua
options will print out the usable words (no anagrams will be generated):
agc andrewtrevorrow -u (prints usable words by increasing length)
agc andrewtrevorrow -ua (prints usable words in alphabetical order)
For a long piece of text there might be thousands of usable words, so you
might want to use the ‑m option to find only the words that match a certain
pattern. This example will only print words starting with "ov":
agc andrewtrevorrow -u -m"ov*"
The following section has all the details about pattern matching.
-
Select one or more usable words that look interesting and then generate
anagrams containing those words by using the -c option:
agc andrewtrevorrow -c overt
agc andrewtrevorrow -c overt -c word
The 2nd example will only generate anagrams containing both words.
Pattern matching
The -m option can be used to print out only the lexicon words or usable words
that match a given pattern. Let's look at some simple examples:
agc andrew -u -m"*a*" (print usable words containing the letter "a")
agc andrew -ua -m"*a*" (ditto, but print the words alphabetically)
You don't need to specify -u if you use -m and supply some text
to be anagrammed. In that case it's assumed you want to match usable words:
agc andrew -m"re*" (print usable words starting with "re")
agc andrew -m"~re*" (print usable words that don't start with "re")
agc andrew -m"*re" (print usable words ending with "re")
agc andrew -m"???" (print usable words with 3 letters)
Similarly, you don't need to specify -p if you use -m without supplying any
anagram text. In that case it's assumed you want to match lexicon words:
agc -m"?9" (print lexicon words with 9 letters)
agc -m"?7-9" (print lexicon words with 7 to 9 letters)
agc -m"?7-" (print lexicon words with at least 7 letters)
agc -m"?-7" (print lexicon words with at most 7 letters)
agc -m"*[xyz]" (print lexicon words ending in "x" or "y" or "z")
agc -m"*x*&*y*" (print lexicon words containing "x" and "y")
agc -m"*(ab|xy)*" (print lexicon words containing "ab" or "xy")
agc -m"[~aeiou]-" (print lexicon words with no vowels)
Note that it's usually best to enclose pattern strings in double quotes.
This is because some of Ag's special pattern characters also have a special
meaning to the shell. Here are Ag's special characters:
* Match zero or more letters.
? Match any single letter.
[...] Match any letter in the given list; eg. [abc].
[~...] Match any letter NOT in the given list; eg. [~aeiou].
N Specifies a fixed repeat count (N is a non-negative integer).
Repeat counts are only allowed after ?, ], or a letter; eg. ?9.
M-N Specifies a variable repeat count, where M and N are optional
non-negative integers indicating the minimum and maximum counts.
If M is missing then 0 is assumed, and if N is missing then
infinity is assumed. Note that * is equivalent to ?-.
<...> Any repeat count can be enclosed in angle brackets; eg. <2-5>.
This form of a repeat count is necessary when using a numeric
lexicon (see below).
- Used inside [...] to indicate a letter range; eg. [a-z],
or to separate min and max repeat counts; eg. ?2-5.
(...) Match a sub-pattern; eg. *(ed|ing).
| Means OR. For matching alternative patterns; eg. a*|b*.
& Means AND. For matching combined patterns; eg. *a*&*b*.
~ Means NOT. Can only be the first character in the pattern
or the first character after ( or [.
! Also means NOT but best avoided when using agc.
Lexicon files
A lexicon file is a binary file containing a list of words in a special
format that allows the file to be loaded into memory very quickly.
The format also allows specific words to be found very quickly.
All lexicon files supplied with Ag have a .lex extension. It's not
strictly required, but it's a good idea to stick to that convention
when naming your own lexicon files.
All words in a lexicon consist of 1 to 30 lowercase letters, possibly
including some non-ASCII letters. The full set of valid letters is:
a to z and áàâäãåçéèêëíìîïñóòôöõúùûüßæøœÿı
Technical note: Non-ASCII letters are stored in the MacRoman encoding.
This allows lexicons to support many non-English languages (French, German,
Italian, Spanish, etc) but retain the simplicity of one byte per letter.
Although Ag uses the MacRoman encoding internally, when it prints out
words it uses the UTF-8 encoding.
Note for Mac users: Lexicon files are in the same format as the "word list"
files used by my defunct Anagrams app, so there's no problem using those
word lists with Ag.
Using a text file as the lexicon
The -t option allows any text file (in UTF-8 encoding) to be used as the lexicon:
agc -t foo.txt -p (prints the unique words in foo.txt)
agc -t foo.txt andrew (generates anagrams using the words in foo.txt)
All of the unique words in the given file will be extracted, but only if
they are "valid" words. A valid word is a contiguous sequence of 1 to 30
lowercase letters (see the previous section for the set of valid letters)
delimited by any of these characters:
NUL to space (this includes TAB, CR, LF) and !"(),.:;?¿¡«»… —“”
The 4th last character is a non-breaking space.
The -o option can be used to save the current lexicon in a new lexicon file.
You'll get an error message if you try to overwrite an existing file.
Some examples:
agc -t foo.txt -o Foo.lex (creates a lexicon file with the words in foo.txt)
agc -o NewWords.lex (copies the default lexicon file)
Numeric lexicons
A numeric lexicon contains "words" consisting entirely of digits (0 to 9)
rather than letters.
Creating a numeric lexicon is quite simple. Just use the -t option to
load a text file that begins with a digit. Ag will then extract all the
unique numbers (ie. digit strings) in the file and build a numeric lexicon.
You can add the -o option to save the results in a .lex file.
There is one potential source of confusion when entering patterns to
match numbers in a numeric lexicon. If you want to specify a repeat
count then you must surround it with angle brackets. This avoids
any ambiguity, as should be clear from the following examples:
agc -l Primes.lex -m"?3" (print all 2-digit primes ending with 3)
agc -l Primes.lex -m"?<3>" (print all 3-digit primes)
Note that angle brackets can also be used in patterns for non-numeric
lexicons, so you might prefer to use them all the time.
Scripting functions
This section describes all the ag.* functions that can be used in a script
run by agc.
ag.adjacent(word1, word2)
Returns true if the given lexicon words differ by only 1 letter.
It assumes the words have the same number of letters (note that they might
not have the same number of bytes if they contain non-ASCII letters).
This function is used to speed up the creation of adjacency graphs in
Scripts/ladders.lua.
Example: if ag.adjacent("dog","dig") then print("adjacent") end
ag.anagrams(text)
Returns an array of anagrams for the given text.
Each anagram is a single string, where multiple words are separated by spaces.
See the description of ag.setoption
for how to change various options used by this function.
Example: local a = ag.anagrams("andrewtrevorrow")
ag.check()
This function does nothing in agc.
(In Ag it is used to allow a long-running script to be stopped.)
ag.clear(n)
Clears the last n lines, or the entire terminal window
if no argument is supplied.
Example: if numlines > 1000 then ag.clear() end
ag.exit(message)
Exits the script with an optional error message.
If a message is supplied then it will be printed.
If no message is supplied then nothing will be printed.
Example: if i < 0 then ag.exit("Oops!") end
ag.getchars(text)
Returns an array containing each character
(possibly non-ASCII) in the given text.
Example: local chars = ag.getchars(word)
ag.getdir(dirname)
Ignores the given string and returns ".\" on Windows or "./" on Mac/Linux.
(This function is best used in scripts run by Ag.)
ag.getstring(prompt, default)
Prints the given prompt, optionally followed by "[default = default]"
and "> " on two extra lines if a non-empty default string is supplied.
It then waits for user input and returns the entered string,
or the default string if the user just hits the enter/return key.
Example: local n = tonumber(ag.getstring("Enter a number:","100"))
ag.gui()
Returns true if Ag is running the script, or false if agc is running the script.
Example: if not ag.gui() then print("this is agc") end
ag.help(htmlfile)
Prints the given file path prefixed by "Open this HTML file: ".
Example: ag.help("results.html")
ag.lexiconpath()
Returns the path given to the -l option, or "Words.lex" if that option is not used.
Example: print("Current lexicon is", ag.lexiconpath())
ag.lexiconwords(pattern="*")
Returns an array of words from the current lexicon
that match the given pattern. If no pattern
is supplied then the array contains all the words in the lexicon.
See the description of ag.setoption
for how to change a couple of options used by this function.
Example: local zwords = ag.lexiconwords("z*")
ag.loadlexicon(filepath)
Loads the given file and sets the current lexicon used by future
calls of ag.lexiconwords, ag.anagrams and ag.usablewords.
A non-absolute path is relative to the current working directory.
If the given file is not an Ag lexicon file then it's assumed
to be a text file and the unique words or numbers will be extracted,
as described above.
Example: ag.loadlexicon("Lexicons/Spanish.lex")
ag.lower(string)
Returns the lowercase version of the given string.
Unlike Lua's string.lower function, ag.lower will correctly convert
any uppercase non-ASCII letters in the string, as long as they are
valid lexicon letters.
Example: print(ag.lower("ÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕÚÙÛÜÆØŒŸ"))
ag.note(message)
Prints the given message prefixed by "NOTE: ".
Example: ag.note("hello")
ag.numchars(text)
Returns the number of characters (possibly non-ASCII) in the given text.
Example: if ag.numchars(word) == 3 then print(word) end
ag.numeric()
Returns true if the current lexicon is a numeric lexicon.
Example: if ag.numeric() then print("lexicon is numeric") end
ag.savelexicon(filepath)
Saves the current lexicon in the given file.
A non-absolute path is relative to the current working directory.
Example: ag.savelexicon("mylexicon.lex")
ag.setoption(name, value)
Sets various options that change the behavior of later ag.* calls.
Here are all the option names, their possible values, and the ag.* functions they affect:
"minlength" | | minimum word length (1 to 30) |
| | used by ag.anagrams, ag.usablewords |
|
"maxlength" | | maximum word length (1 to 30) |
| | used by ag.anagrams, ag.usablewords |
|
"minwords" | | minimum words in an anagram (1 to 50) |
| | used by ag.anagrams |
|
"maxwords" | | maximum words in an anagram (1 to 50) |
| | used by ag.anagrams |
|
"maxanagrams" | | maximum number of anagrams (0 = no limit) |
| | used by ag.anagrams |
|
"increase" | | anagram words increase in length (true or false) |
| | used by ag.anagrams |
|
"alphabetic" | | return words in alphabetical order (true or false) |
| | used by ag.lexiconwords, ag.usablewords |
|
"uppercase" | | return words in uppercase (true or false) |
| | used by ag.anagrams, ag.lexiconwords, ag.usablewords |
Example: ag.setoption("maxanagrams", 1000)
ag.upper(string)
Returns the uppercase version of the given string.
Unlike Lua's string.upper function, ag.upper will correctly convert
any lowercase non-ASCII letters in the string, as long as they are
valid lexicon letters.
Example: print(ag.upper("ááâäãåçéèêëíìîïñóòôöõúùûüæøœÿ"))
ag.usablewords(text)
Returns an array of all the words that can be made
from the letters in the given text.
See the description of ag.setoption
for how to change various options used by this function.
Example: local u = ag.usablewords("retsina")
ag.wordfound(word)
Returns true if the given word exists in the current lexicon.
Example: if ag.wordfound("cat") then patcat() end
String arrays
A number of ag.* functions return arrays containing strings.
These are standard Lua arrays — that is, tables indexed by integers where
the first string has index 1. All the strings use the UTF-8 encoding and might
contain non-ASCII characters, so to count the characters in a string you should
avoid using the standard Lua length operator "#" as it returns the number of
bytes. Instead, use ag.numchars to return the
number of characters:
if ag.numchars(word) > 2 then print(word) end
And to extract each character in a UTF-8 string you can use
ag.getchars:
for _, ch in ipairs(ag.getchars(word)) do print(ch) end
Note that the Lua functions string.lower and string.upper don't convert
non-ASCII characters correctly, so use ag.lower
and ag.upper instead.
* * * * *
Well done if you managed to read this far. Now go have some fun!
Author: Andrew Trevorrow (andrew@trevorrow.com)
aka Overt Word Warren.
|