mirror of
https://github.com/Stichting-MINIX-Research-Foundation/netbsd.git
synced 2025-08-09 22:20:19 -04:00
443 lines
11 KiB
Groff
443 lines
11 KiB
Groff
.\" $NetBSD: bufferio.9,v 1.16 2015/03/30 14:09:04 riastradh Exp $
|
|
.\"
|
|
.\" Copyright (c) 2015 The NetBSD Foundation, Inc.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" This code is derived from software contributed to The NetBSD Foundation
|
|
.\" by Taylor R. Campbell.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
|
|
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
|
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
|
|
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
.\" POSSIBILITY OF SUCH DAMAGE.
|
|
.\"
|
|
.Dd March 29, 2015
|
|
.Dt BUFFERIO 9
|
|
.Os
|
|
.Sh NAME
|
|
.Nm BUFFERIO ,
|
|
.Nm biodone ,
|
|
.Nm biowait ,
|
|
.Nm getiobuf ,
|
|
.Nm putiobuf ,
|
|
.Nm nestiobuf_setup ,
|
|
.Nm nestiobuf_done
|
|
.Nd block I/O buffer transfers
|
|
.Sh SYNOPSIS
|
|
.In sys/buf.h
|
|
.Ft void
|
|
.Fn biodone "struct buf *bp"
|
|
.Ft int
|
|
.Fn biowait "struct buf *bp"
|
|
.Ft struct buf *
|
|
.Fn getiobuf "struct vnode *vp" "bool waitok"
|
|
.Ft void
|
|
.Fn putiobuf "struct buf *bp"
|
|
.Ft void
|
|
.Fn nestiobuf_setup "struct buf *mbp" "struct buf *bp" "int offset" \
|
|
"size_t size"
|
|
.Ft void
|
|
.Fn nestiobuf_done "struct buf *mbp" "int donebytes" "int error"
|
|
.Sh DESCRIPTION
|
|
The
|
|
.Nm
|
|
subsystem manages block I/O buffer transfers, described by the
|
|
.Vt "struct buf"
|
|
structure, which serves multiple purposes between users in
|
|
.Nm ,
|
|
users in
|
|
.Xr buffercache 9 ,
|
|
and users in block device drivers to execute transfers to physical
|
|
disks.
|
|
.Sh BLOCK DEVICE USERS
|
|
Users of
|
|
.Nm
|
|
wishing to submit a buffer for block I/O transfer must obtain a
|
|
.Vt "struct buf" ,
|
|
e.g. via
|
|
.Fn getiobuf ,
|
|
fill its parameters, and submit it to a block device with
|
|
.Xr bdev_strategy 9 ,
|
|
usually via
|
|
.Xr VOP_STRATEGY 9 .
|
|
.Pp
|
|
The parameters to an I/O transfer described by
|
|
.Fa bp
|
|
are specified by the following
|
|
.Vt "struct buf"
|
|
fields:
|
|
.Bl -tag -offset abcd
|
|
.It Fa bp Ns Li "->b_flags"
|
|
Flags specifying the type of transfer.
|
|
.Bl -tag -compact
|
|
.It Dv B_READ
|
|
Transfer is read from device.
|
|
If not set, transfer is write to device.
|
|
.It Dv B_ASYNC
|
|
Asynchronous I/O.
|
|
Caller must not provide
|
|
.Fa bp Ns Li "->b_iodone"
|
|
and must not call
|
|
.Fn biowait bp .
|
|
.El
|
|
For legibility, callers should indicate writes by passing the
|
|
pseudo-flag
|
|
.Dv B_WRITE ,
|
|
which is zero.
|
|
.It Fa bp Ns Li "->b_data"
|
|
Pointer to kernel virtual address of source/target for transfer.
|
|
.It Fa bp Ns Li "->b_bcount"
|
|
Nonnegative number of bytes requested for transfer.
|
|
.It Fa bp Ns Li "->b_blkno"
|
|
Block number at which to do transfer.
|
|
.It Fa bp Ns Li "->b_iodone"
|
|
I/O completion callback.
|
|
.Dv B_ASYNC
|
|
must not be set in
|
|
.Fa bp Ns Li "->b_flags" .
|
|
.El
|
|
.Pp
|
|
Additionally, if the I/O transfer is a write associated with a
|
|
.Xr vnode 9
|
|
.Fa vp ,
|
|
then before the user submits it to a block device, the user must
|
|
increment
|
|
.Fa vp Ns Li "->v_numoutput" .
|
|
The user must not acquire
|
|
.Fa vp Ns Ap s
|
|
vnode lock between incrementing
|
|
.Fa vp Ns Li "->v_numoutput"
|
|
and submitting
|
|
.Fa bp
|
|
to a block device -- doing so will likely cause deadlock with the
|
|
syncer.
|
|
.Pp
|
|
Block I/O transfer completion may be notified by the
|
|
.Fa bp Ns Li "->b_iodone"
|
|
callback, by signalling
|
|
.Fn biowait
|
|
waiters, or not at all in the
|
|
.Dv B_ASYNC
|
|
case.
|
|
.Bl -dash
|
|
.It
|
|
If the user sets the
|
|
.Fa bp Ns Li "->b_iodone"
|
|
callback to a
|
|
.Pf non- Dv NULL
|
|
function pointer, it will be called in soft interrupt context when the
|
|
I/O transfer is complete.
|
|
The user
|
|
.Em may not
|
|
call
|
|
.Fn biowait bp
|
|
in this case.
|
|
.It
|
|
If
|
|
.Dv B_ASYNC
|
|
is set, then the I/O transfer is asynchronous and the user will not be
|
|
notified when it is completed.
|
|
The user
|
|
.Em may not
|
|
call
|
|
.Fn biowait bp
|
|
in this case.
|
|
.It
|
|
Otherwise, if
|
|
.Fa bp Ns Li "->b_iodone"
|
|
is
|
|
.Dv NULL
|
|
and
|
|
.Dv B_ASYNC
|
|
is not specified, the user may wait for the I/O transfer to complete
|
|
with
|
|
.Fn biowait bp .
|
|
.El
|
|
.Pp
|
|
Once an I/O transfer has completed, its
|
|
.Vt "struct buf"
|
|
may be reused, but the user must first clear the
|
|
.Dv BO_DONE
|
|
flag of
|
|
.Fa bp Ns Li "->b_oflags"
|
|
before reusing it.
|
|
.Sh NESTED I/O TRANSFERS
|
|
Sometimes an I/O transfer from a single buffer in memory cannot go to a
|
|
single location on a block device: it must be split up into smaller
|
|
transfers for each segment of the memory buffer.
|
|
.Pp
|
|
After initializing the
|
|
.Li b_flags ,
|
|
.Li b_data ,
|
|
and
|
|
.Li b_bcount
|
|
parameters of an I/O transfer for the buffer, called the
|
|
.Em master
|
|
buffer, the user can issue smaller transfers for segments of the buffer
|
|
using
|
|
.Fn nestiobuf_setup .
|
|
When nested I/O transfers complete, in any order, they debit from the
|
|
amount of work left to be done in the master buffer.
|
|
If any segments of the buffer were skipped, the user can report this
|
|
with
|
|
.Fn nestiobuf_done
|
|
to debit the skipped part of the work.
|
|
.Pp
|
|
The master buffer's I/O transfer is completed when all nested buffers'
|
|
I/O transfers are completed, and if
|
|
.Fn nestiobuf_done
|
|
is called in the case of skipped segments.
|
|
.Pp
|
|
For writes associated with a vnode
|
|
.Fa vp ,
|
|
.Fn nestiobuf_setup
|
|
accounts for
|
|
.Fa vp Ns Li "->v_numoutput" ,
|
|
so the caller is not allowed to acquire
|
|
.Fa vp Ns Ap s
|
|
vnode lock before submitting the nested I/O transfer to a block
|
|
device.
|
|
However, the caller is responsible for accounting the master buffer in
|
|
.Fa vp Ns Li "->v_numoutput" .
|
|
This must be done very carefully because after incrementing
|
|
.Fa vp Ns Li "->v_numoutput" ,
|
|
the caller is not allowed to acquire
|
|
.Fa vp Ns Ap s
|
|
vnode lock before either calling
|
|
.Fn nestiobuf_done
|
|
or submitting the last nested I/O transfer to a block device.
|
|
.Pp
|
|
For example:
|
|
.Bd -literal -offset abcd
|
|
struct buf *mbp, *bp;
|
|
size_t skipped = 0;
|
|
unsigned i;
|
|
int error = 0;
|
|
|
|
mbp = getiobuf(vp, true);
|
|
mbp->b_data = data;
|
|
mbp->b_resid = mbp->b_bcount = datalen;
|
|
mbp->b_flags = B_WRITE;
|
|
|
|
KASSERT(0 < nsegs);
|
|
KASSERT(datalen == nsegs*segsz);
|
|
for (i = 0; i < nsegs; i++) {
|
|
struct vnode *devvp;
|
|
daddr_t blkno;
|
|
|
|
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
|
|
error = VOP_BMAP(vp, i*segsz, &devvp, &blkno, NULL);
|
|
VOP_UNLOCK(vp);
|
|
if (error == 0 && blkno == -1)
|
|
error = EIO;
|
|
if (error) {
|
|
/* Give up early, don't try to handle holes. */
|
|
skipped += datalen - i*segsz;
|
|
break;
|
|
}
|
|
|
|
bp = getiobuf(vp, true);
|
|
nestiobuf_setup(bp, mbp, i*segsz, segsz);
|
|
bp->b_blkno = blkno;
|
|
if (i == nsegs - 1) /* Last segment. */
|
|
break;
|
|
VOP_STRATEGY(devvp, bp);
|
|
}
|
|
|
|
/*
|
|
* Account v_numoutput for master write.
|
|
* (Must not vn_lock before last VOP_STRATEGY!)
|
|
*/
|
|
mutex_enter(&vp->v_interlock);
|
|
vp->v_numoutput++;
|
|
mutex_exit(&vp->v_interlock);
|
|
|
|
if (skipped)
|
|
nestiobuf_done(mbp, skipped, error);
|
|
else
|
|
VOP_STRATEGY(devvp, bp);
|
|
.Ed
|
|
.Sh BLOCK DEVICE DRIVERS
|
|
Block device drivers implement a
|
|
.Sq strategy
|
|
method, in the
|
|
.Li d_strategy
|
|
member of
|
|
.Li struct bdevsw
|
|
.Pq Xr driver 9 ,
|
|
to queue a buffer for disk I/O.
|
|
The inputs to the strategy method are:
|
|
.Bl -tag -offset abcd
|
|
.It Fa bp Ns Li "->b_flags"
|
|
Flags specifying the type of transfer.
|
|
.Bl -tag -compact
|
|
.It Dv B_READ
|
|
Transfer is read from device.
|
|
If not set, transfer is write to device.
|
|
.El
|
|
.It Fa bp Ns Li "->b_data"
|
|
Pointer to kernel virtual address of source/target for transfer.
|
|
.It Fa bp Ns Li "->b_bcount"
|
|
Nonnegative number of bytes requested for transfer.
|
|
.It Fa bp Ns Li "->b_blkno"
|
|
Block number at which to do transfer, relative to partition start.
|
|
.El
|
|
.Pp
|
|
If the strategy method uses
|
|
.Xr bufq 9 ,
|
|
it must additionally initialize the following fields before queueing
|
|
.Fa bp
|
|
with
|
|
.Xr bufq_put 9 :
|
|
.Bl -tag -offset abcd
|
|
.It Fa bp Ns Li "->b_rawblkno"
|
|
Block number relative to volume start.
|
|
.El
|
|
.Pp
|
|
When the I/O transfer is complete, whether it succeeded or failed, the
|
|
strategy method must:
|
|
.Bl -dash
|
|
.It
|
|
Set
|
|
.Fa bp Ns Li "->b_error"
|
|
to zero on success, or to an
|
|
.Xr errno 2
|
|
error code on failure.
|
|
.It
|
|
Set
|
|
.Fa bp Ns Li "->b_resid"
|
|
to the number of bytes remaining to transfer, whether on success or
|
|
on failure.
|
|
If no bytes were transferred, this must be set to
|
|
.Fa bp Ns Li "->b_bcount" .
|
|
.It
|
|
Call
|
|
.Fn biodone bp .
|
|
.El
|
|
.Sh FUNCTIONS
|
|
.Bl -tag -width abcd
|
|
.It Fn biodone bp
|
|
Notify that the I/O transfer described by
|
|
.Fa bp
|
|
has completed.
|
|
.Pp
|
|
To be called by a block device driver.
|
|
Caller must first set
|
|
.Fa bp Ns Li "->b_error"
|
|
to an error code and
|
|
.Fa bp Ns Li "->b_resid"
|
|
to the number of bytes remaining to transfer.
|
|
.It Fn biowait bp
|
|
Wait for the synchronous I/O transfer described by
|
|
.Fa bp
|
|
to complete.
|
|
Returns the value of
|
|
.Fa bp Ns Li "->b_error" .
|
|
.Pp
|
|
To be called by a user requesting the I/O transfer.
|
|
.Pp
|
|
May not be called if
|
|
.Fa bp
|
|
has a callback or is asynchronous -- that is, if
|
|
.Fa bp Ns Li "->b_iodone"
|
|
is set, or if
|
|
.Dv B_ASYNC
|
|
is set in
|
|
.Fa bp Ns Li "->b_flags" .
|
|
.It Fn getiobuf vp waitok
|
|
Allocate a
|
|
.Vt "struct buf"
|
|
for an I/O transfer.
|
|
If
|
|
.Fa vp
|
|
is
|
|
.Pf non- Dv NULL ,
|
|
the transfer is associated with it.
|
|
If
|
|
.Fa waitok
|
|
is false,
|
|
returns
|
|
.Dv NULL
|
|
if none can be allocated immediately.
|
|
.Pp
|
|
The resulting
|
|
.Vt "struct buf"
|
|
pointer must eventually be passed to
|
|
.Fn putiobuf
|
|
to release it.
|
|
Do
|
|
.Em not
|
|
use
|
|
.Xr brelse 9 .
|
|
.Pp
|
|
The buffer may not be used for an asynchronous I/O transfer, because
|
|
there is no way to know when it is completed and may be safely passed
|
|
to
|
|
.Fn putiobuf .
|
|
Asynchronous I/O transfers are allowed only for buffers in the
|
|
.Xr buffercache 9 .
|
|
.Pp
|
|
May sleep if
|
|
.Fa waitok
|
|
is true.
|
|
.It Fn putiobuf bp
|
|
Free
|
|
.Fa bp ,
|
|
which must have been allocated by
|
|
.Fn getiobuf .
|
|
Either
|
|
.Fa bp
|
|
must never have been submitted to a block device, or the I/O transfer
|
|
must have completed.
|
|
.El
|
|
.Sh CODE REFERENCES
|
|
The
|
|
.Nm
|
|
subsystem is implemented in
|
|
.Pa sys/kern/vfs_bio.c .
|
|
.Sh SEE ALSO
|
|
.Xr buffercache 9 ,
|
|
.Xr bufq 9
|
|
.Sh BUGS
|
|
The
|
|
.Nm
|
|
abstraction provides no way to cancel an I/O transfer once it has been
|
|
submitted to a block device.
|
|
.Pp
|
|
The
|
|
.Nm
|
|
abstraction provides no way to do I/O transfers with non-kernel pages,
|
|
e.g. directly to buffers in userland without copying into the kernel
|
|
first.
|
|
.Pp
|
|
The
|
|
.Vt "struct buf"
|
|
type is all mixed up with the
|
|
.Xr buffercache 9 .
|
|
.Pp
|
|
The
|
|
.Nm
|
|
abstraction is a totally idiotic API design.
|
|
.Pp
|
|
The
|
|
.Li v_numoutput
|
|
accounting required of
|
|
.Nm
|
|
callers is asinine.
|