This commit adds a new TCP/IP service to MINIX 3. As its core, the
service uses the lwIP TCP/IP stack for maintenance reasons. The
service aims to be compatible with NetBSD userland, including its
low-level network management utilities. It also aims to support
modern features such as IPv6. In summary, the new LWIP service has
support for the following main features:
- TCP, UDP, RAW sockets with mostly standard BSD API semantics;
- IPv6 support: host mode (complete) and router mode (partial);
- most of the standard BSD API socket options (SO_);
- all of the standard BSD API message flags (MSG_);
- the most used protocol-specific socket and control options;
- a default loopback interface and the ability to create one more;
- configuration-free ethernet interfaces and driver tracking;
- queuing and multiple concurrent requests to each ethernet driver;
- standard ioctl(2)-based BSD interface management;
- radix tree backed, destination-based routing;
- routing sockets for standard BSD route reporting and management;
- multicast traffic and multicast group membership tracking;
- Berkeley Packet Filter (BPF) devices;
- standard and custom sysctl(7) nodes for many internals;
- a slab allocation based, hybrid static/dynamic memory pool model.
Many of its modules come with fairly elaborate comments that cover
many aspects of what is going on. The service is primarily a socket
driver built on top of the libsockdriver library, but for BPF devices
it is at the same time also a character driver.
Change-Id: Ib0c02736234b21143915e5fcc0fda8fe408f046f
This new implementation of the UDS service is built on top of the
libsockevent library. It thereby inherits all the advantages that
libsockevent brings. However, the fundamental restructuring
required for that change also paved the way for resolution of a
number of other important open issues with the old UDS code. Most
importantly, the rewrite brings the behavior of the service much
closer to POSIX compliance and NetBSD compatibility. These are the
most important changes:
- due to the use of libsockevent, UDS now supports multiple suspending
calls per socket and a large number of standard socket flags and
options;
- socket address matching is now based on <device,inode> lookups
instead of canonized path names, and socket addresses are no longer
altered either due to canonization or at connect time;
- the socket state machine is now well defined, most importantly
resolving the erroneous reset-on-EOF semantics of the old UDS, but
also allowing socket reuse;
- sockets are now connected before being accepted instead of being
held in connecting state, unless the LOCAL_CONNWAIT option is set
on either the connecting or the listening socket;
- connect(2) on datagram sockets is now supported (needed by syslog),
and proper datagram socket disconnect notification is provided;
- the receive queue now supports segmentation, associating ancillary
data (in-flight file descriptors and credentials) with each segment
instead of being kept fully separately; this is a POSIX requirement
(and needed by tmux);
- as part of the segmentation support, the receive queue can now hold
as many packets as can fit, instead of one;
- in addition to the flags supported by libsockevent, the MSG_PEEK,
MSG_WAITALL, MSG_CMSG_CLOEXEC, MSG_TRUNC, and MSG_CTRUNC send and
receive flags are now supported;
- the SO_PASSCRED and SO_PEERCRED socket options are replaced by
LOCAL_CREDS and LOCAL_PEEREID respectively, now following NetBSD
semantics and allowing use of NetBSD libc's getpeereid(3);
- memory usage is reduced by about 250 KB due to centralized in-flight
file descriptor tracking, with a limit of OPEN_MAX total rather than
of OPEN_MAX per socket;
- memory usage is reduced by another ~50 KB due to removal of state
redundancy, despite the fact that socket path names may now be up to
253 bytes rather than the previous 104 bytes;
- compared to the old UDS, there is now very little direct indexing on
the static array of sockets, thus allowing dynamic allocation of
sockets more easily in the future;
- the UDS service now has RMIB support for the net.local sysctl tree,
implementing preliminary support for NetBSD netstat(1).
Change-Id: I4a9b6fe4aaeef0edf2547eee894e6c14403fcb32
Currently, the BSD socket API is implemented in libc, translating the
API calls to character driver operations underneath. This approach
has several issues:
- it is inefficient, as most character driver operations are specific
to the socket type, thus requiring that each operation start by
bruteforcing the socket protocol family and type of the given file
descriptor using several system calls;
- it requires that libc itself be changed every time system support
for a new protocol is added;
- various parts of the libc implementations violate the asynchronous
signal safety POSIX requirements.
In order to resolve all these issues at once, the plan is to turn the
BSD socket calls into system calls, thus making the BSD socket API the
"native" ABI, removing the complexity from libc and instead letting
VFS deal with the socket calls.
The overall change is going to break all networking functionality. In
order to smoothen the transition, this patch introduces the fifteen
new BSD socket system calls, and makes libc try these first before
falling back on the old behavior. For now, the VFS implementations of
the new calls fail such that libc will always use the fallback cases.
Later on, when we introduce the actual implementation of the native
BSD socket calls, all statically linked programs will automatically
use the new ABI, thus limiting actual application breakage.
In other words: by itself, this patch does nothing, except add a bit
of transitional overhead that will disappear in the future. The
largest part of the patch is concerned with adding full support for
the new BSD socket system calls to trace(1) - this early addition has
the advantage of making system call tracing output of several socket
calls much more readable already.
Both the system call interfaces and the trace(1) support have already
been tested using code that will be committed later on.
Change-Id: I3460812be50c78be662d857f9d3d6840f3ca917f
This patch adds support for Unix98 pseudo terminals, that is,
posix_openpt(3), grantpt(3), unlockpt(3), /dev/ptmx, and /dev/pts/.
The latter is implemented with a new pseudo file system, PTYFS.
In effect, this patch adds secure support for unprivileged pseudo
terminal allocation, allowing programs such as tmux(1) to be used by
non-root users as well. Test77 has been extended with new tests, and
no longer needs to run as root.
The new functionality is optional. To revert to the old behavior,
remove the "ptyfs" entry from /etc/fstab.
Technical nodes:
o The reason for not implementing the NetBSD /dev/ptm approach is that
implementing the corresponding ioctl (TIOCPTMGET) would require
adding a number of extremely hairy exceptions to VFS, including the
PTY driver having to create new file descriptors for its own device
nodes.
o PTYFS is required for Unix98 PTYs in order to avoid that the PTY
driver has to be aware of old-style PTY naming schemes and even has
to call chmod(2) on a disk-backed file system. PTY cannot be its
own PTYFS since a character driver may currently not also be a file
system. However, PTYFS may be subsumed into a DEVFS in the future.
o The Unix98 PTY behavior differs somewhat from NetBSD's, in that
slave nodes are created on ptyfs only upon the first call to
grantpt(3). This approach obviates the need to revoke access as
part of the grantpt(3) call.
o Shutting down PTY may leave slave nodes on PTYFS, but once PTY is
restarted, these leftover slave nodes will be removed before they
create a security risk. Unmounting PTYFS will make existing PTY
slaves permanently unavailable, and absence of PTYFS will block
allocation of new Unix98 PTYs until PTYFS is (re)mounted.
Change-Id: I822b43ba32707c8815fd0f7d5bb7a438f51421c1