This commit adds a new TCP/IP service to MINIX 3. As its core, the
service uses the lwIP TCP/IP stack for maintenance reasons. The
service aims to be compatible with NetBSD userland, including its
low-level network management utilities. It also aims to support
modern features such as IPv6. In summary, the new LWIP service has
support for the following main features:
- TCP, UDP, RAW sockets with mostly standard BSD API semantics;
- IPv6 support: host mode (complete) and router mode (partial);
- most of the standard BSD API socket options (SO_);
- all of the standard BSD API message flags (MSG_);
- the most used protocol-specific socket and control options;
- a default loopback interface and the ability to create one more;
- configuration-free ethernet interfaces and driver tracking;
- queuing and multiple concurrent requests to each ethernet driver;
- standard ioctl(2)-based BSD interface management;
- radix tree backed, destination-based routing;
- routing sockets for standard BSD route reporting and management;
- multicast traffic and multicast group membership tracking;
- Berkeley Packet Filter (BPF) devices;
- standard and custom sysctl(7) nodes for many internals;
- a slab allocation based, hybrid static/dynamic memory pool model.
Many of its modules come with fairly elaborate comments that cover
many aspects of what is going on. The service is primarily a socket
driver built on top of the libsockdriver library, but for BPF devices
it is at the same time also a character driver.
Change-Id: Ib0c02736234b21143915e5fcc0fda8fe408f046f
This new implementation of the UDS service is built on top of the
libsockevent library. It thereby inherits all the advantages that
libsockevent brings. However, the fundamental restructuring
required for that change also paved the way for resolution of a
number of other important open issues with the old UDS code. Most
importantly, the rewrite brings the behavior of the service much
closer to POSIX compliance and NetBSD compatibility. These are the
most important changes:
- due to the use of libsockevent, UDS now supports multiple suspending
calls per socket and a large number of standard socket flags and
options;
- socket address matching is now based on <device,inode> lookups
instead of canonized path names, and socket addresses are no longer
altered either due to canonization or at connect time;
- the socket state machine is now well defined, most importantly
resolving the erroneous reset-on-EOF semantics of the old UDS, but
also allowing socket reuse;
- sockets are now connected before being accepted instead of being
held in connecting state, unless the LOCAL_CONNWAIT option is set
on either the connecting or the listening socket;
- connect(2) on datagram sockets is now supported (needed by syslog),
and proper datagram socket disconnect notification is provided;
- the receive queue now supports segmentation, associating ancillary
data (in-flight file descriptors and credentials) with each segment
instead of being kept fully separately; this is a POSIX requirement
(and needed by tmux);
- as part of the segmentation support, the receive queue can now hold
as many packets as can fit, instead of one;
- in addition to the flags supported by libsockevent, the MSG_PEEK,
MSG_WAITALL, MSG_CMSG_CLOEXEC, MSG_TRUNC, and MSG_CTRUNC send and
receive flags are now supported;
- the SO_PASSCRED and SO_PEERCRED socket options are replaced by
LOCAL_CREDS and LOCAL_PEEREID respectively, now following NetBSD
semantics and allowing use of NetBSD libc's getpeereid(3);
- memory usage is reduced by about 250 KB due to centralized in-flight
file descriptor tracking, with a limit of OPEN_MAX total rather than
of OPEN_MAX per socket;
- memory usage is reduced by another ~50 KB due to removal of state
redundancy, despite the fact that socket path names may now be up to
253 bytes rather than the previous 104 bytes;
- compared to the old UDS, there is now very little direct indexing on
the static array of sockets, thus allowing dynamic allocation of
sockets more easily in the future;
- the UDS service now has RMIB support for the net.local sysctl tree,
implementing preliminary support for NetBSD netstat(1).
Change-Id: I4a9b6fe4aaeef0edf2547eee894e6c14403fcb32
This change effectively adds the VFS side of support for the SO_LINGER
socket option, by allowing file descriptor close operations to be
suspended (and later resumed) by socket drivers. Currently, support
is limited to the close(2) system call--in all other cases where file
descriptors are closed (dup2, close-on-exec, process exit..), the
close operation still completes instantly. As a general policy, the
close(2) return value will always indicate that the file descriptor
has been closed: either 0, or -1 with errno set to EINPROGRESS. The
latter error may be thrown only when a suspended close is interrupted
by a signal.
As necessary for UDS, this change also introduces a closenb(2) system
call extension, allowing the caller to bypass blocking SO_LINGER close
behavior. This extension allows UDS to avoid blocking on closing the
last reference to an in-flight file descriptor, in an atomic fashion.
The extension is currently part of libsys, but there is no reason why
userland would not be allowed to make this call, so it is deliberately
not protected from use by userland.
Change-Id: Iec77d6665232110346180017fc1300b1614910b7
IMPORTANT: this change has a docs/UPDATING entry!
This rename is unfortunately necessary because NetBSD has decided to
create its own service(8) utility, and we will want to import theirs
as well. The two can obviously not coexist.
Also move ours from /bin to /sbin, as it is a superuser-only utility.
Change-Id: Ic6e46ffb3a84b4747d2fdcb0d74e62dbea065039
- POLLRDBAND is reported by select(2) as errorfd, not readfd;
- POLLERR is not the same as errorfd of select(2);
- flags that are not requested should not be returned.
Change-Id: I9cb3c2c260ead5a2852a2fbbc10280c2b5b0dff9
- clear "revents" fields even when the call times out;
- do not call FD_ISSET with a negative file descriptor number.
Change-Id: I7aeaae79e73e39aed127a75495ea08256b18c182
With this patch, it is now possible to generate coverage information
for MINIX3 system services with LLVM. In particular, the system can
be built with MKCOVERAGE=yes, either with a native "make build" or
with crosscompilation. Either way, MKCOVERAGE=yes will build the
MINIX3 system services with coverage profiling support, generating a
.gcno file for each source module. After a reboot it is possible to
obtain runtime coverage data (.gcda files) for individual system
services using gcov-pull(8). The combination of the .gcno and .gcda
files can then be inspected with llvm-cov(1).
For reasons documented in minix.gcov.mk, only system service program
modules are supported for now; system service libraries (libsys etc.)
are not included. Userland programs are not affected by MKCOVERAGE.
The heart of this patch is the libsys code that writes data generated
by the LLVM coverage hooks into a serialized format using the routines
we already had for GCC GCOV. Unfortunately, the new llvm_gcov.c code
is LLVM ABI dependent, and may therefore have to be updated later when
we upgrade LLVM. The current implementation should support all LLVM
versions 3.x with x >= 4.
The rest of this patch is mostly a light cleanup of our existing GCOV
infrastructure, with as most visible change that gcov-pull(8) now
takes a service label string rather than a PID number.
Change-Id: I6de055359d3d2b3f53e426f3fffb17af7877261f
The way these options work is by creating files that contain debugging
symbols and stashing them in a dedicated set. The minix-debug set has
been created for this purpose, but it will probably have to be refined
since it has been tested only with the default options with an i386
cross-build.
LSC: Amended to support many combination of MKDEBUG, MKDEBUGLIB, with
and without X11, for both intel and arm.
Change-Id: I2901952e8229938f9ac79c8656484acf704ccd9b
Previously, VFS would use various subsets of a number of fproc
structure fields to store state when the process is blocked
(suspended) for various reasons. As a result, there was a fair
amount of abuse of fields, hidden state, and confusion as to
which fields were used with which suspension states.
Instead, the suspension state is now split into per-state
structures, which are then stored in a union. Each of the union's
structures should be accessed only right before, during, and right
after the fp_blocked_on field is set to the corresponding blocking
type. As a result, it is now very clear which fields are in use
at which times, and we even save a bit of memory as a side effect.
Change-Id: I5c24e353b6cb0c32eb41c70f89c5cfb23f6c93df
Instead, filter it in libc for old networking implementations, as
those do not support sending SIGPIPE to user processes anyway. This
change allows newer socket drivers to implement the flag as per the
specification.
Change-Id: I423bdf28ca60f024a344d0a73e2eab38f1b269da
Commit git-c38dbb9 inadvertently broke local MINIX3-on-MINIX3 builds,
since its libc changes relied on VFS being upgraded already as well.
As a result, after installing the new libc, networking ceased to work,
leading to curl(1) failing later on in the build process. This patch
introduces transitional code that is necessary for the build process
to complete, after which it is obsolete again.
Change-Id: I93bf29c01d228e3d7efc7b01befeff682954f54d
Currently, the BSD socket API is implemented in libc, translating the
API calls to character driver operations underneath. This approach
has several issues:
- it is inefficient, as most character driver operations are specific
to the socket type, thus requiring that each operation start by
bruteforcing the socket protocol family and type of the given file
descriptor using several system calls;
- it requires that libc itself be changed every time system support
for a new protocol is added;
- various parts of the libc implementations violate the asynchronous
signal safety POSIX requirements.
In order to resolve all these issues at once, the plan is to turn the
BSD socket calls into system calls, thus making the BSD socket API the
"native" ABI, removing the complexity from libc and instead letting
VFS deal with the socket calls.
The overall change is going to break all networking functionality. In
order to smoothen the transition, this patch introduces the fifteen
new BSD socket system calls, and makes libc try these first before
falling back on the old behavior. For now, the VFS implementations of
the new calls fail such that libc will always use the fallback cases.
Later on, when we introduce the actual implementation of the native
BSD socket calls, all statically linked programs will automatically
use the new ABI, thus limiting actual application breakage.
In other words: by itself, this patch does nothing, except add a bit
of transitional overhead that will disappear in the future. The
largest part of the patch is concerned with adding full support for
the new BSD socket system calls to trace(1) - this early addition has
the advantage of making system call tracing output of several socket
calls much more readable already.
Both the system call interfaces and the trace(1) support have already
been tested using code that will be committed later on.
Change-Id: I3460812be50c78be662d857f9d3d6840f3ca917f
The reorganization allows other libc system call wrappers (namely,
sendmsg and recvmsg) to perform I/O vector coalescing as well.
Change-Id: I116b48a6db39439053280ee805e0dcbdaec667a3
There is no reason to use a single message for nonoverlapping requests
and replies combined, and in fact splitting them out allows reuse of
messages and avoids various problems with field layouts. Since the
upcoming socketpair(2) system call will be using the same reply as
pipe2(2), split up the single message used for the latter. In order
to keep the used parts of messages at the front, start a transitional
phase to move the pipe(2) flags field to the front of its request.
Change-Id: If3f1c3d348ec7e27b7f5b7147ce1b9ef490dfab9
Previously, the libc sendto(3) and recvfrom(3) implementations would
blindly assume that any unrecognized socket is a raw-IP socket. This
is not only inconsistent but also messes with returned error codes.
Change-Id: Id0328f04ea8ca0968a4e8636bc441caa0c3579b7
- switch to the NetBSD identifier system; it is not only better, but
also required for porting NetBSD ipcs(1) and ipcrm(1); however, it
requires that slots not be moved, and that results in some changes;
- synchronize some other things with NetBSD: where keys are kept, as
well as various non-permission mode flags;
- fix semctl(2) vararg retrieval and message field type;
- use SUSPEND instead of weird reply exceptions in the call table;
- fix several memory leaks and at least one missing permission check;
- improve the atomicity of semop(2) by a small amount, even though
its atomicity is still broken at a fundamental level;
- use the new cheaper way to retrieve the current time;
- resolve all level-5 LLVM warnings.
Change-Id: I0c47aacde478b23bb77d628384aeab855a22fdbf
Now that uname(3) uses sysctl(2), we no longer need sysuname(2).
Backward compatibility is retained for old statically linked
binaries for a short while.
Also remove the now-obsolete MINIX3-specific "arch" field from the
utsname structure. While this is an ABI break at the libc level,
it should pose no problems in practice, because:
- statically linked programs (i.e., all of the base system) are not
affected, as they will use headers synchronized with libc;
- the structure is getting smaller, thus, older dynamically linked
programs (typically in pkgsrc) using the new libc will end up with
garbage in the "arch" field, but it is unlikely they will use this
field anyway, since it was specific to MINIX3;
- new dynamically linked programs using an old libc could end up with
memory corruption, but this is not a scenario that is expected to
occur in the first place - certainly not with programs from pkgsrc.
Change-Id: I29c76576f509feacc8f996f0bd353ca8961d4917
The new MIB service implements the sysctl(2) system call which, as
we adopt more NetBSD code, is an increasingly important part of the
operating system API. The system call is implemented in the new
service rather than as part of an existing service, because it will
eventually call into many other services in order to gather data,
similar to ProcFS. Since the sysctl(2) functionality is used even
by init(8), the MIB service is added to the boot image.
MIB stands for Management Information Base, and the MIB service
should be seen as a knowledge base of management information.
The MIB service implementation of the sysctl(2) interface is fairly
complete; it incorporates support for both static and dynamic nodes
and imitates many NetBSD-specific quirks expected by userland. The
patch also adds trace(1) support for the new system call, and adds
a new test, test87, which tests the fundamental operation of the
MIB service rather thoroughly.
Change-Id: I4766b410b25e94e9cd4affb72244112c2910ff67
This brings our tree to NetBSD 7.0, as found on -current on the
10-10-2015.
This updates:
- LLVM to 3.6.1
- GCC to GCC 5.1
- Replace minix/commands/zdump with usr.bin/zdump
- external/bsd/libelf has moved to /external/bsd/elftoolchain/
- Import ctwm
- Drop sprintf from libminc
Change-Id: I149836ac18e9326be9353958bab9b266efb056f0
This patch adds support for the wait4 system call, and with that the
wait3 call as well. The implementation is absolutely minimal: only
user and system times of the exited child are returned (with all other
rusage fields left zero), and there is no support for tracers. Still,
this should cover the main use cases of wait4.
Change-Id: I7a04589a8423a23990ab39aa38e85d535556743a
- the userland call is now made to PM only, and PM relays the call to
other servers as appropriate; this is an ABI change that will
ultimately allow us to add proper support for wait3() and the like;
for the moment there is backward compatibility;
- the getrusage-specific kernel subcall has been removed, as it
provided only redundant functionality, and did not provide the means
to be extended correctly in the future - namely, allowing the kernel
to return different values depending on whether resource usage of
the caller (self) or its children was requested;
- VM is now told whether resource usage of the caller (self) or its
children is requested, and it refrains from filling in wrong values
for information it does not have;
- VM now uses the correct unit for the ru_maxrss values;
- VFS is cut out of the loop entirely, since it does not provide any
values at the moment; a comment explains how it should be readded.
Change-Id: I27b0f488437dec3d8e784721c67b03f2f853120f
Various generic file IOCTL calls should be processed by VFS rather
than individual drivers. For this reason, we rewrite them to use
fcntl(2) instead.
Change-Id: I38a5f3c7b21943a897114a51678a800f7e7f0a77
Currently, the userland ABI uses a single field ('user_sp') far
into the very large 'kinfo' structure on the shared kernel
information page. This precludes us from modifying or getting
rid of 'kinfo' in the future without breaking userland. This
patch adds a separate 'kuserinfo' structure to the kernel
information page, with only information that is part of the
userland ABI, in an extensible manner. Userland now uses this
field if it is present, and falls back to the old field if not.
Change-Id: Ib7b24b53a440f40a2edc28cdfa48447ac2179288
Instead of importing an external _minix_kerninfo variable, any code
using the shared kernel page should now call get_minix_kerninfo(3).
Since this is the only logical name for such a function, rename the
previous get_minix_kerninfo call to ipc_minix_kerninfo.
Change-Id: I2e424b6fb55aa55d3da850187f1f7a0b7cbbf910
The entire infrastructure relied on an ACK feature, and as such, it
has been broken for years now, with no easy way to repair it.
Change-Id: I783c2a21276967af115a642199f31fef0f14a572
- synchronize request type with ioctl by making it unsigned long;
- unbreak VFS requests, as they were being sent to PM;
- use proper ioctl direction flags (and new numbers) for requests;
- remove some needless header inclusions;
- svrctl is in libc, make its message name reflect this;
- keep backward compatibility: svrctl is part of the userland ABI.
Change-Id: I44902e8d0d11b8ebc1ef3bda94d2202481743c9b
UDS expects the device number of the actual socket, not the device on
which the socket happens to reside. The code worked only because PFS
returned the same value in the st_dev stat field, which it will have
to continue doing for a while now.
Change-Id: I426d38a86a96307ff6e6ed8099d37dae02d6bf2b
- replace a stray assert(0) with abort()
- remove unrequired copy-pasted #undef NDEBUG
- replace some #if NDEBUG by #if DEBUG as they protect debug printf()s.
Change-Id: Iff4c0331b06e860d32d91ce6b1d6c765ed065c8b
This concerns all services, a.k.a drivers, filesystem drivers, network
(inet, lwip, uds) servers, and the system servers.
Change-Id: I626fd15c795e15af42df2d10d47fb4a703665d63