ngram-merge [ -help ] [ -write outfile ] [ -float-counts ] \ [ -- ] infile1 infile2 ...
The input format consists of one N-gram count per line,
word1 word2 ... wordn count
Each filename argument can be a plain ASCII count file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicate stdin/stdout.
ngram-merge is recommended in cases where the full counts would far exceed available real memory. Although an arbitrary number of input count files is accepted, it is best to use the program as follows. First, partition the input text into the largest chunks so that ngram-count can run in real memory. Then merge the resulting sorted counts using ngram-merge pairwise, and continue doing so in a binary tree pattern until a single count file containing all N-grams remains. This procedure is automated by the make-batch-counts and merge-batch-counts scripts.
Each filename argument can be an ASCII file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicate stdin/stdout.