Dish Catalog- Project Brief (15-Sept-2013)
1) Project background
We are looking to gather a list of dishes from page source code of food.com. Also the associated keywords and ingredients and info described below.
We need bots to crawl each recipe page to collect the relevant information. The output should be in MS Excel/MS Access, one dish per row
2) what information to collect from page source code
- recipe pag url
- dish name
- dish photo url
- Total Time
- Prep Time
- Cook Time
- list of ingredients (without hyperlink)
- Directions (without hyperlink)
- Full list of in source code like:
content="time-to-make,course,main-ingredient,preparation,healthy,main-dish,poultry,easy,low-fat,chicken,dietary,low-cholesterol,low-saturated-fat,low-calorie,oamc-freezer-make-ahead,healthy-2,low-in-something,meat,chicken-breasts,number-of-servings,4-hours-or-less,boneless skinless chicken breasts,olive oil,garlic cloves,green peppers,onions,soy sauce,pineapple chunks,vinegar,brown sugar,gingerroot,cornstarch,cooked rice recipe, recipes, recipies,time-to-make,course,main-ingredient,preparation,healthy,main-dish,poultry,easy,low-fat,chicken,dietary,low-cholesterol,low-saturated-fat,low-calorie,oamc-freezer-make-ahead,healthy-2,low-in-something,meat,chicken-breasts,number-of-servings,4-hours-or-less,time-to-make,course,main-ingredient,preparation,healthy,main-dish,poultry,easy,low-fat,chicken,dietary,low-cholesterol,low-saturated-fat,low-calorie,oamc-freezer-make-ahead,healthy-2,low-in-something,meat,chicken-breasts,number-of-servings,4-hours-or-less,time-to-make,course,main-ingredient,preparation,healthy,main-dish,poul"