HTML Scraper Script(repost)

Ukończone Opublikowano Aug 18, 2009 Płatność przy odbiorze
Ukończone Płatność przy odbiorze

I need a quick script to pull down two sets of authenticated HTML pages, parse them and return the results in CSV format.

This will need to be a command line script that would run under OS X and Linux. My order of preference for the language would be 1. Python, 2. Bash Shell, or 3. Perl.

The detailed items provided include a Wireshark capture session including a login, pull down of the orders listing, and pull down of two sample orders.

## Deliverables

I need a script that will combine a couple of sets of data and give me information about orders in a given date range output in CSV format. Generally, I will run this through cron shortly after midnight to get the previous day's order information so that I can paste it into a tracking spreadsheet. However, I would also like the capabilities to analyze orders over a date range if desired.

Software should be a command line script with the following run time options:

-u (BrickLink username)

-p (BrickLink password)

-h (help screen showing usage examples)

-d1 (Start date, optional)

-d2 (End date, optional)

The start and end dates are optional parameters where if specified they must both be specified. Dates can be specified in YYYYMMDD format.

The default usage should only include -u and -p as required parameters and the default date range should be the previous day only.

First, you'll need to get a list of order numbers from the desired date range using this URL:

[url removed, login to view]

Note that orders can be *updated*, which causes the dates (and other values) to change, but the order number will remain the same. Please sort the output so that the orders are displayed in increasing order number. This will help me catch order updates and replace the existing values so that I don't double count an order that was placed on one day and updated the next.

After getting the list of orders for the date range, you'll need to pull the order details for each order. These contain the information to be parsed for the output. Most but not all of this information is duplicated from the previous page, and would probably be easier to parse from there. However, the orderReceived listing can be modified to show less detail or show details in a different order, which would screw with the parsing. For that reason, I'd like everything to be parsed from the orderDetail page, which uses this URL:

[url removed, login to view]

Where 5555555 would represent the order numbers parsed from the previous page.

On the orderDetail page, the following text may appear at the bottom of the page:

"First batch of this order has been referred by [[url removed, login to view]][1]."

If this appears, I would like the output for Peeron to read "Y", otherwise it should read "N".

All of the order info should be dumped in CSV format as the output using the following fields in the given order:

"BrickLink",Order Number, Order Date, Number of Parts, Number of Lots, Grand Total, Shipping, Insurance, Charge 1, Charge 2, Coupon, Credit, Payment Method, Username, E-mail address,Orders in Store, PeerOn

The "Coupon" field can be a Y/N field. If a coupon is used, the phrase "Coupon Applied" will appear between the "Order Total" and "Buyer Information" sections.

The "BrickLink" item will be a constant that appears without quotes at the beginning of each line. This represents the source of the sale as I sell in multiple locations, but all of these orders will come from BrickLink

Inżynieria Linux Mac OS MySQL Perl PHP Zarządzanie projektem Python Architektura oprogramowania Testowanie oprogramowania

Numer ID Projektu: #2841302

O projekcie

6 ofert Zdalny projekt Aktywny Aug 18, 2009

Przyznany użytkownikowi:

nunos

See private message.

$63.74 USD w ciągu 14 dni
(14 ocen)
3.7

6 freelancerów złożyło ofertę za $58 w tym projekcie

jeremiahdodds

See private message.

$51 USD w ciągu 14 dni
(26 Oceny)
4.9
mreznik

See private message.

$63.75 USD w ciągu 14 dni
(22 Oceny)
3.9
ivicamunitic

See private message.

$42.5 USD w ciągu 14 dni
(11 Oceny)
3.8
jvkoder

See private message.

$63.75 USD w ciągu 14 dni
(8 Oceny)
2.5
pwhelan

See private message.

$63.75 USD w ciągu 14 dni
(0 Oceny)
0.0