The previous system was causing a lot of lock contention when transforms are modified in the Cull thread.
The new implementation doesn't use a linked list or lock at all, but a simple atomically incrementing integer that indicates that the prev transforms have changed. set_transform() reads this and backs up the prev transform the first time a transform is modified after reset_all_prev_transforms() is called.
Use Sleep(0) instead. Sleep(0) is not guaranteed to yield, which is a problem, but Sleep(1) can easily take up to 16 ms, which is really unacceptable except in very low-priority thread. But really, you shouldn't be relying on force_yield() for anything except with the SIMPLE_THREADS model.
There is also SwitchToThread(), but in fact it is even weaker than Sleep(0).
Adds patomic_signed_lock_free, patomic_unsigned_lock_free, and patomic_flag with wait/notify methods modelled after C++20. Implemented using futexes, falling back to a mutex+condition variable hash table if not supported. (Currently the hash table has a fixed size of 64, which we could increase if necessary, but we really shouldn't even have a fraction of that number of simultaneously sleeping threads...)
Other atomic types are unaffected at the moment, in part because futexes are really restricted to 32-bit ints on Linux anyway
std::function has unnecessary overhead, better to just create an AsyncTask subclass in-place storing the closure
This obsoletes FunctionAsyncTask, it will be removed in a future commit
Timer queries are significantly more efficient, are synchronized to CPU time, and the synchronized frame numbering makes it possible to correlate stuff in the Timeline view
- New powerful scrolling Timeline view for seeing all time events across all threads
- Redo flame graph to use stack-based nesting rather than the standard collector nesting
- Rewrite flame graph drawing to not use labels
- Status bar appears in main window showing top-level level collectors; double-clicking them brings up their chart and right-clicking them shows their children
- Context menus are added when right-clicking labels and charts
- Tooltips now appear when mouse hovers over collector area in a chart
- Strip chart windows now automatically determine the appropriate scale better
- Graph menus redone to allow opening flame chart anywhere as well as strip chart
- Instead of just ms everywhere, also use s / us / ns where appropriate
- Don't disable smoothing right away on mouse down on strip chart, only after dragging
- Windows: The MDI child windows are quite ugly and overlap with the status bar, so instead they are now top-level windows, but some code is added to make them spawn inside and move with the parent window, and minimize to its corner. I can back this out if people prefer the old behavior despite the ugly decoration
- Windows: Label text shows ellipsis when cut off
- Windows: Graph windows no longer have icons
- Windows: Graph windows no longer spawn perfectly on top of each other, rather cascading
- GTK: Render at high resolution when GDK_SCALE is not 1
- GTK: Graph windows are forced to be floating in tiling WMs
- GTK: Flame chart window no longer has useless dividing bar
- GTK: Use more efficient cairo surface types
- "App:Show code:General" is gone, it was causing too much trouble
- Replace odd "Client::GuiObjects" with "Nodes:GUI"
- Regroup "Dirty PipelineCyclers" underneath "PipelineCyclers"
Partial backport of 07545bc9e318d1799ceabe8838d04d7ad9297a45 for Windows, requires building with `--override USE_MEMORY_MIMALLOC=1 --override USE_DELETED_CHAIN=UNDEF` for optimum effect
We don't need the extra precision, in fact it is detrimental to restoring build caches in a cross-platform way.
This commit will invalidate all current build caches.
Cherry-picked from 2a904f398592ce7effedc4f12720be0cef9b6cc9 (see #1260)
For experimentation only - it's disabled by default unless you also specify --override USE_MEMORY_MIMALLOC=1 (I did not see a discernable benefit over glibc, but more experimentation is warranted, especially with older glibc versions)
We don't need the extra precision, in fact it is detrimental to restoring build caches in a cross-platform way.
This commit will invalidate all current build caches.
Windows' malloc has awful performance. mimalloc is orders of magnitude faster, even faster than DeletedBufferChain. Therefore, only enable USE_DELETED_CHAIN on Windows when building without mimalloc.
On Linux, mimalloc doesn't appear to be measurably faster than glibc's own allocator. Both are marginally than DeletedBufferChain, though, and substantially faster in the multi-threaded case, so USE_DELETED_CHAIN is disabled there in all cases.
Speedup is realised by using thread-local variables. Note that on Windows we can't inline get_current_thread, but it's still faster this way than calling TlsGetValue.
In theory the cache line alignment should help avoid false sharing but I have not profiled that extensively.
* New "Flame Graph" chart for seeing all collectors in a frame, much easier to read than piano roll
* Update controls, fonts, background color to more modern visual style on Windows
* Proper support for high DPI monitors (with correct scaling)
* Add tooltips for collector labels showing full name and averaged value
* Colors of collectors are now converted to sRGB transfer encoding
* Major performance improvement to piano roll view on Windows
* Movering mouse over labels now highlights the corresponding area in chart
* Label hover effect changed to darkening effect instead of border
* Reimplement graph as static common control on Windows
* Check boxes are now clickable by their label on Windows
* Graph windows have minimum sizes on Windows