It is not clear in which language you require it to be written. Where are you going to run it?
I would do it as follows, first, would implement this:
- Read a plain text list of word-to-symbols into a sorted list.
- Process the input text using the list to transform words into symbols
2nd stage:
- Implement the reverse function, i.e building the inverse list (easy, requires very little new code to be added to the previous stage).
3rd stage:
- Once the code of the previous stage does what you want, I'll add functionality to read and write to several files in a single run (trivial)
4th stage:
- Add functionality to skip parts of the text.
Of course, we can further refine the requirements.
Also, if you are processing extremely large quantities of data, it could be later improved by using a special tree instead of a list, pre-compiling it so that the cost isn't incurred in each run, and making it process files in parallel (or could divide a file in blocks and process those in parallel, in the case of large files).
Please consider this proposal. Thanks in advance.