Save Information to DB at Crawling
- From: "mich dobelman" <sss@xxxxxxxx>
- Date: Wed, 16 Aug 2006 09:33:53 GMT
I am trying to make a crawling program to grab information and store them
into the database.
The web site is structured as following.
REGION
CATEGORY
PROPERTY LISTING
In the site there are about 50 regions each region has 20 category or less
and at the maximum one category
can be as many as 2000( can display 20 property for each page). In order to
get all information, my crawler is going to each property page using regular
expression
to extract specific info( Price, BR, Contact Info etc)..
I have problem to decide when and where I can save it to database. Note that
this crawler is scheduled to go to the website
to get info every day and if the property information is not changed from
last modification date the crawler is going to skip
the property.
I create the following tables to store those information
Region Table
ID, Region Name
Category Table
ID(1~20), Category Name
Property Table
ID, Category, Name, Address, Price, Contact Info, Bed Rooms, Contact Info,
Location(Lat), Location(Lon)
.
- Follow-Ups:
- Re: Save Information to DB at Crawling
- From: DA Morgan
- Re: Save Information to DB at Crawling
- Prev by Date: Re: Small Redo Log File Size
- Next by Date: SQL query: how can I hide a column in output?
- Previous by thread: Corrept Undo Tablespace and oracle internal problem
- Next by thread: Re: Save Information to DB at Crawling
- Index(es):
Relevant Pages
|