Skip to content

The Isobel Project

Personal tools
Sections

Isobel scheduling model (Draft)

Document Actions

Definitions:

Site:
A site is a set of entryPoints, for a site we can define parameters which limit load on the site itself, such as maximum downloadable size per time interval or the minimum delay between fetching. Every site is assigned to a separate thread.

EntryPoint:
The starting point for a crawling. Every entry point belongs to a site. Each entry point can have different configuration parameters, such as fetcher type to start with or pipeline application to use. Each entry point has a scheduling type and time.

Scheduling model:
Isobel schedules entry points. Entry Point scheduling is defined in the configuration.
For each entry point you specify:

  1. Schedule start time
  2. Schedule end time
  3.  Start precision
  4.  Priority
Scheduling depends on the Scheduling algorithm used by the system. You can implement your own scheduling algorithm. Scheduling algorithms have access to configuration, statistical and history data about entryPoints.

A basic scheduler is provided.
   
It schedules entry points at the specified activation time, if more than one entry point is active at the same time it performs a round robin scheduling.



Created by manuel
Last modified 2005-10-24 11:26