I have a 25 mB file with 4 fields (see below) and 900,000 records. At the moment it's a tab-delimited .DAT file. I need to eliminate most of the records and to reduce the file to three fields instead of four, following some simple rules:
STEP 1 - Eliminate unnecessary records
1) If Field 3 or Field 4 is blank, delete the record and look at the next record.
2) If Field 3 or Field 4 contains any alphabet letters (A-Z), delete the record and look at the next record.
3) If the value in Field 2 of the current record matches the value in Field 2 of the previous record, delete the current record (but not the previous record), then look at the next record.
4) Repeat steps 1-3 until all of the records have been examined.
STEP 2 - Combine Fields 1 & 2
1) For all remaining records, combine Fields 1 and 2 into one Field, so if Field 1 contains "US" and Field 2 contains "92084" the new combined field should be: "US92084"
--------------------
Here are the Fields in the original .dat file:
CountryCode MailCode Longitude Latitude
US 30183 34.3217 -84.562
US 30127 33.9135
US 31535 31.5046 -82.8246
The compressed file should look like this:
CCMailCode Longitude Latitude
US30183 34.3217 -84.562
US31535 31.5046 -82.8246
Also, I'd like to keep a copy of the program (so maybe it should be an .EXE file), because I might need to run it again in a year or so on a similar array. The program can be written in any language, I suppose. If it's in PHP, that would a bonus, so I can edit it. Thanks!
Pure C application (will deliver both the executable and source code) or Lua (which you can edit and run without recompiling.) Fully commented source code in both cases. No GUI, command line only. Delivery in two hours after acceptance.