mirror of
https://github.com/vlang/v.git
synced 2025-09-11 16:36:20 -04:00
doc: update the section Performance tuning
(#19530)
This commit is contained in:
parent
04eb86c9fa
commit
04c8d45934
235
doc/docs.md
235
doc/docs.md
@ -6042,63 +6042,222 @@ To improve safety and maintainability, operator overloading is limited.
|
|||||||
|
|
||||||
## Performance tuning
|
## Performance tuning
|
||||||
|
|
||||||
The generated C code is usually fast enough, when you compile your code
|
When compiled with `-prod`, V's generated C code usually performs well. However, in specialized
|
||||||
with `-prod`. There are some situations though, where you may want to give
|
scenarios, additional compiler flags and attributes can further optimize the executable for
|
||||||
additional hints to the compiler, so that it can further optimize some
|
performance, memory usage, or size.
|
||||||
blocks of code.
|
|
||||||
|
|
||||||
> **Note**
|
> [!NOTE]
|
||||||
> These are *rarely* needed, and should not be used, unless you
|
> These are *rarely* needed, and should not be used unless you
|
||||||
> *profile your code*, and then see that there are significant benefits for them.
|
> *profile your code*, and then see that there are significant benefits for them.
|
||||||
> To cite gcc's documentation: "programmers are notoriously bad at predicting
|
> To cite GCC's documentation: "Programmers are notoriously bad at predicting
|
||||||
> how their programs actually perform".
|
> how their programs actually perform".
|
||||||
|
|
||||||
`[inline]` - you can tag functions with `[inline]`, so the C compiler will
|
| Tuning Operation | Benefits | Drawbacks |
|
||||||
try to inline them, which in some cases, may be beneficial for performance,
|
|--------------------------|---------------------------------|---------------------------------------------------|
|
||||||
but may impact the size of your executable.
|
| `[inline]` | Performance | Increased executable size |
|
||||||
|
| `[direct_array_access]` | Performance | Safety risks |
|
||||||
|
| `[packed]` | Memory usage | Potential performance loss |
|
||||||
|
| `[minify]` | Performance, Memory usage | May break binary serialization/reflection |
|
||||||
|
| `_likely_/_unlikely_` | Performance | Risk of negative performance impact |
|
||||||
|
| `-skip-unused` | Performance, Compile time, Size | Potential instability |
|
||||||
|
| `-fast-math` | Performance | Risk of incorrect mathematical operations results |
|
||||||
|
| `-d no_segfault_handler` | Compile time, Size | Loss of segfault trace |
|
||||||
|
| `-cflags -march=native` | Performance | Risk of reduced CPU compatibility |
|
||||||
|
| `PGO` | Performance, Size | Usage complexity |
|
||||||
|
|
||||||
`[direct_array_access]` - in functions tagged with `[direct_array_access]`
|
### Tuning operations details
|
||||||
the compiler will translate array operations directly into C array operations -
|
|
||||||
omitting bounds checking. This may save a lot of time in a function that iterates
|
|
||||||
over an array but at the cost of making the function unsafe - unless
|
|
||||||
the boundaries will be checked by the user.
|
|
||||||
|
|
||||||
`if _likely_(bool expression) {` this hints the C compiler, that the passed
|
#### `[inline]`
|
||||||
boolean expression is very likely to be true, so it can generate assembly
|
|
||||||
code, with less chance of branch misprediction. In the JS backend,
|
|
||||||
that does nothing.
|
|
||||||
|
|
||||||
`if _unlikely_(bool expression) {` similar to `_likely_(x)`, but it hints that
|
You can tag functions with `[inline]`, so the C compiler will try to inline them, which in some
|
||||||
the boolean expression is highly improbable. In the JS backend, that does nothing.
|
cases, may be beneficial for performance, but may impact the size of your executable.
|
||||||
|
|
||||||
<a id='Reflection via codegen'>
|
**When to Use**
|
||||||
|
|
||||||
### Memory usage optimization
|
- Functions that are called frequently in performance-critical loops.
|
||||||
|
|
||||||
V offers these attributes related to memory usage
|
**When to Avoid**
|
||||||
that can be applied to a structure type: `[packed]` and `[minify]`.
|
|
||||||
These attributes affect memory layout of a structure, potentially leading to reduced
|
- Large functions, as it might cause code bloat and actually decrease performance.
|
||||||
cache/memory usage and improved performance.
|
- Large functions in `if` expressions - may have negative impact on instructions cache.
|
||||||
|
|
||||||
|
#### `[direct_array_access]`
|
||||||
|
|
||||||
|
In functions tagged with `[direct_array_access]` the compiler will translate array operations
|
||||||
|
directly into C array operations - omitting bounds checking. This may save a lot of time in a
|
||||||
|
function that iterates over an array but at the cost of making the function unsafe - unless the
|
||||||
|
boundaries will be checked by the user.
|
||||||
|
|
||||||
|
**When to Use**
|
||||||
|
|
||||||
|
- In tight loops that access array elements, where bounds have been manually verified or you are
|
||||||
|
sure that the access index will be valid.
|
||||||
|
|
||||||
|
**When to Avoid**
|
||||||
|
|
||||||
|
- Everywhere else.
|
||||||
|
|
||||||
#### `[packed]`
|
#### `[packed]`
|
||||||
|
|
||||||
The `[packed]` attribute can be added to a structure to create an unaligned memory layout,
|
The `[packed]` attribute can be added to a structure to create an unaligned memory layout,
|
||||||
which decreases the overall memory footprint of the structure.
|
which decreases the overall memory footprint of the structure. Using the `[packed]` attribute
|
||||||
|
may negatively impact performance or even be prohibited on certain CPU architectures.
|
||||||
|
|
||||||
> **Note**
|
**When to Use**
|
||||||
> Using the [packed] attribute may negatively impact performance
|
|
||||||
> or even be prohibited on certain CPU architectures.
|
- When memory usage is more critical than performance, e.g., in embedded systems.
|
||||||
> Only use this attribute if minimizing memory usage is crucial for your program
|
|
||||||
> and you're willing to sacrifice performance.
|
**When to Avoid**
|
||||||
|
|
||||||
|
- On CPU architectures that do not support unaligned memory access or when high-speed memory access
|
||||||
|
is needed.
|
||||||
|
|
||||||
#### `[minify]`
|
#### `[minify]`
|
||||||
|
|
||||||
The `[minify]` attribute can be added to a struct, allowing the compiler to reorder the fields
|
The `[minify]` attribute can be added to a struct, allowing the compiler to reorder the fields in
|
||||||
in a way that minimizes internal gaps while maintaining alignment.
|
a way that minimizes internal gaps while maintaining alignment. Using the `[minify]` attribute may
|
||||||
|
cause issues with binary serialization or reflection. Be mindful of these potential side effects
|
||||||
|
when using this attribute.
|
||||||
|
|
||||||
> **Note**
|
**When to Use**
|
||||||
> Using the `[minify]` attribute may cause issues with binary serialization or reflection.
|
|
||||||
> Be mindful of these potential side effects when using this attribute.
|
- When you want to minimize memory usage and you're not using binary serialization or reflection.
|
||||||
|
|
||||||
|
**When to Avoid**
|
||||||
|
|
||||||
|
- When using binary serialization or reflection, as it may cause unexpected behavior.
|
||||||
|
|
||||||
|
#### `_likely_/_unlikely_`
|
||||||
|
|
||||||
|
`if _likely_(bool expression) {` - hints to the C compiler, that the passed boolean expression is
|
||||||
|
very likely to be true, so it can generate assembly code, with less chance of branch misprediction.
|
||||||
|
In the JS backend, that does nothing.
|
||||||
|
|
||||||
|
`if _unlikely_(bool expression) {` is similar to `_likely_(x)`, but it hints that the boolean
|
||||||
|
expression is highly improbable. In the JS backend, that does nothing.
|
||||||
|
|
||||||
|
**When to Use**
|
||||||
|
|
||||||
|
- In conditional statements where one branch is clearly more frequently executed than the other.
|
||||||
|
|
||||||
|
**When to Avoid**
|
||||||
|
|
||||||
|
- When the prediction can be wrong, as it might cause a performance penalty due to branch
|
||||||
|
misprediction.
|
||||||
|
|
||||||
|
#### `-skip-unused`
|
||||||
|
|
||||||
|
This flag tells the V compiler to omit code that is not needed in the final executable to run your
|
||||||
|
program correctly. This will remove unneeded `const` arrays allocations and unused functions
|
||||||
|
from the code in the generated executable.
|
||||||
|
|
||||||
|
This flag will be on by default in the future when its implementation will be stabilized and all
|
||||||
|
severe bugs will be found.
|
||||||
|
|
||||||
|
**When to Use**
|
||||||
|
|
||||||
|
- For production builds where you want to reduce the executable size and improve runtime
|
||||||
|
performance.
|
||||||
|
|
||||||
|
**When to Avoid**
|
||||||
|
|
||||||
|
- Where it doesn't work for you.
|
||||||
|
|
||||||
|
#### `-fast-math`
|
||||||
|
|
||||||
|
This flag enables optimizations that disregard strict compliance with the IEEE standard for
|
||||||
|
floating-point arithmetic. While this could lead to faster code, it may produce incorrect or
|
||||||
|
less accurate mathematical results.
|
||||||
|
|
||||||
|
The full specter of math operations that `-fast-math` affects can be found
|
||||||
|
[here](https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffast-math).
|
||||||
|
|
||||||
|
**When to Use**
|
||||||
|
|
||||||
|
- In applications where performance is more critical than precision, like certain graphics
|
||||||
|
rendering tasks.
|
||||||
|
|
||||||
|
**When to Avoid**
|
||||||
|
|
||||||
|
- In applications requiring strict mathematical accuracy, such as scientific simulations or
|
||||||
|
financial calculations.
|
||||||
|
|
||||||
|
#### `-d no_segfault_handler`
|
||||||
|
|
||||||
|
Using this flag omits the segfault handler, reducing the executable size and potentially improving
|
||||||
|
compile time. However, in the case of a segmentation fault, the output will not contain stack trace
|
||||||
|
information, making debugging more challenging.
|
||||||
|
|
||||||
|
**When to Use**
|
||||||
|
|
||||||
|
- In small, well-tested utilities where a stack trace is not essential for debugging.
|
||||||
|
|
||||||
|
**When to Avoid**
|
||||||
|
|
||||||
|
- In large-scale, complex applications where robust debugging is required.
|
||||||
|
|
||||||
|
#### `-cflags -march=native`
|
||||||
|
|
||||||
|
This flag directs the C compiler to generate instructions optimized for the host CPU. This can
|
||||||
|
improve performance but will produce an executable incompatible with other/older CPUs.
|
||||||
|
|
||||||
|
**When to Use**
|
||||||
|
|
||||||
|
- When the software is intended to run only on the build machine or in a controlled environment
|
||||||
|
with identical hardware.
|
||||||
|
|
||||||
|
**When to Avoid**
|
||||||
|
|
||||||
|
- When distributing the software to users with potentially older CPUs.
|
||||||
|
|
||||||
|
#### PGO (Profile-Guided Optimization)
|
||||||
|
|
||||||
|
PGO allows the compiler to optimize code based on its behavior during sample runs. This can improve
|
||||||
|
performance and reduce the size of the output executable, but it adds complexity to the build
|
||||||
|
process.
|
||||||
|
|
||||||
|
**When to Use**
|
||||||
|
|
||||||
|
- For performance-critical applications where the added build complexity is justifiable.
|
||||||
|
|
||||||
|
**When to Avoid**
|
||||||
|
|
||||||
|
- For small, short-lived, or rapidly-changing projects where the added build complexity isn't
|
||||||
|
justified.
|
||||||
|
|
||||||
|
**PGO with Clang**
|
||||||
|
|
||||||
|
This is an example bash script you can use to optimize your CLI V program without user interactions.
|
||||||
|
In most cases, you will need to change this script to make it suitable for your particular program.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Get the full path to the current directory
|
||||||
|
CUR_DIR=$(pwd)
|
||||||
|
|
||||||
|
# Remove existing PGO data
|
||||||
|
rm -f *.profraw
|
||||||
|
rm -f default.profdata
|
||||||
|
|
||||||
|
# Initial build with PGO instrumentation
|
||||||
|
v -cc clang -skip-unused -prod -cflags -fprofile-generate -o pgo_gen .
|
||||||
|
|
||||||
|
# Run the instrumented executable 10 times
|
||||||
|
for i in {1..10}; do
|
||||||
|
./pgo_gen
|
||||||
|
done
|
||||||
|
|
||||||
|
# Merge the collected data
|
||||||
|
llvm-profdata merge -o default.profdata *.profraw
|
||||||
|
|
||||||
|
# Compile the optimized version using the PGO data
|
||||||
|
v -cc clang -skip-unused -prod -cflags "-fprofile-use=${CUR_DIR}/default.profdata" -o optimized_program .
|
||||||
|
|
||||||
|
# Remove PGO data and instrumented executable
|
||||||
|
rm *.profraw
|
||||||
|
rm pgo_gen
|
||||||
|
```
|
||||||
|
|
||||||
## Atomics
|
## Atomics
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user