Product SiteDocumentation Site

6.3. anticipatory Scheduler

An application that issues a read request for a specific disk block may also issue a request for the next disk block after a certain think time. However, in most cases, by the time the request for the next disk block is issued, the disk head may have already moved further past. This results in additional latency for the application.
To address this, the anticipatory scheduler enforces a delay after servicing an I/O requests before moving to the next request. This gives an application a window within which to submit another I/O request. If the next I/O request was for the next disk block (as anticipated), the anticipatory scheduler helps ensure that it is serviced before the disk head has a chance to move past the targeted disk block.
Read and write requests are dispatched and serviced in batches. The anticipatory scheduler alternates between dispatching/servicing batches of read and write requests. The frequency, amount of time and priority given to each batch type depends on the settings configured in /sys/block/<device>/queue/iosched/.
The cost of using the anticipatory scheduler is the overall latency caused by numerous enforced delays. You should consider this trade-off when assessing the suitability of the anticipatory scheduler for your system. In most small systems that use applications with many dependent reads, the improvement in throughput from using the anticipatory scheduler significantly outweighs the overall latency.
The anticipatory scheduler tends to be recommended for servers running data processing applications that are not regularly interrupted by external requests. Examples of these are servers dedicated to compiling software. For the most part, the anticipatory scheduler performs well on most personal workstations, but very poorly for server-type workloads.
The tunable variables for the anticipatory scheduler are set in files found under /sys/block/<device>/queue/iosched/. These files are:
read_expire
Read requests are generally more important than write requests; as such, it is advisable to issue a faster expiration time to read_expire. In most cases, this is half of write_expire.
For example, if write_expire is set at 248, it is advisable to set read_expire to 124.
write_expire
The amount of time (in milliseconds) before each write I/O request expires.
read_batch_expire
The amount of time (in milliseconds) that the I/O subsystem should spend servicing a batch of read requests before servicing pending write batches (if there are any). . Also, read_batch_expire is typically set as a multiple of read_expire.
write_batch_expire
The amount of time (in milliseconds) that the I/O subsystem should spend servicing a batch of write requests before servicing pending write batches.
antic_expire
The amount of time (in milliseconds) to wait for an application to issue another I/O request before moving on to a new request.