I have a .csv file which contains the names of 431,359 firms in one column. Some of the names are unique while others are common but differ by punctuation, spacing or one word. For example, there exists five observations of ABC Corporation. However, each observation differs: (1) "ABC Corp." (2) "ABC; Corp." (3) " ABC Corp " (4) "ABC Corp., Inc" (5) " \ABC Corporation ". The goal is to create an adjacent column which contains the corrected firm name, "ABC Corp."
* **Standardize the names** (e.g. the above case with 5 observations). Create an adjacent column containing the edited firm names. This entails removing extraneous spacing and punctuation along with ensuring that all instances of the name are the same. In the event, there is a unique firm name, it should also be free of extraneous spacing and punctuation.
* **Add two additional columns to the file**. (1) The first column will contain a variable that is equal to 1 if the firm name was edited in anyway and zero otherwise. (2) The second column will contain a variable equal to 1 if the firm name appears more than once and zero otherwise.
If you have any further questions, please advise.