a random hydrus banner

/hydrus/ - Hydrus Network

Bug reports, feature requests, and other discussion for the hydrus network.


New Thread
X
Max 20 files0 B total
[New Thread]

Page: Prev [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] Next | [Index] [Catalog] [Banners] [Logs]


https://youtube.com/watch?v=iePWOFOSl_U
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v452/Hydrus.Network.452.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v452/Hydrus.Network.452.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v452/Hydrus.Network.452.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v452/Hydrus.Network.452.-.Linux.-.Executable.tar.gz

I had an ok week. I ended up doing a lot of boring behind the scenes prep, but there is also some nice new quality of life. If you sync with the PTR, update will take a few minutes this week.

all misc this week

After a very long delay, I finally have shortcuts for seeking video back/forwards in the media viewer. New users will get ctrl+left/right to seek back 2.5 seconds or forwards 5 seconds. Existing users need to add their own--please check it under the 'media viewers - all' shortcut set. You can set whatever seek distance you like, and even set multiple with different distance jumps. Sorry for how long it took to get this in--I had to update my shortcut system first!

I did some behind the scenes tag work this week. The database can now handle larger 'tag as number' searches. Update will take a few minutes to load these bigger numbers into the fast search cache.

If you have a subscription that is not meant to completely sync (e.g. it pulls a sample from a gallery with unusual sort order, or you have the 'normal file limit' intentionally small), there is a new checkbox that suppresses the 'hey, there was a gap, click here to fill it in' popup windows.

full list

- misc:
- my 'simple' shortcut commands can now store additional variables. to start things off, I have finally added 'seek video' shortcut commands that have back/forwards and second+millisecond values. these should work on the native video viewer and mpv, audio and video. existing users will have to add their own (do it to the 'media viewers - all' set), but new users will get ctrl+left/right for 2.5s back and 5s forwards as the new defaults. let me know if you have any trouble!
- the maximum number tracked by 'tag as number' system predicate is expanded from -99999999->99999999 to -2^63 -> 2^63 - 1. tag caches will be regenerated on update to store these, it will take a few minutes. the input ui for the system predicate is temporarily limited to -/+2^31, but I'll expand it
- subscriptions now have a checkbox for 'do not worry about subscription gaps'. if you have a subscription that gets files randomly, or gets an intentionally small sample, this will disable the 'hey, there was a gap, click here to fill it in' popup messages
- you can now set negative values for the duplicate score weights in options->duplicates
- also added a weight for the 'nice ratio' duplicate comparison
- vastly improved the cancel speed for searches in the realm of 'get the files that have any xxx tag', be that all tags or a namespace wildcard, and also some important search setup for various 'all known files' search pages. if you have ever tried to search the PTR raw and run into a three minute uncancellable initial setup lag, it should be gone now!
- when you right-click on files in a specific local file domain (e.g. trash or my files), the 'view this file's dupes' number check is now run on 'all local files' as well, and if the numbers differ, a second menu is shown for all local files. this should make it easier to chase dupes of trashed files that are still untrashed while also allowing a trash-only search
- fixed a critical bug in repository mapping processing that was not adding mappings to certain caches for files imported before the repo was added, and/or the new repository 'per content type' processing reset. this mostly manifested as these files not showing up in search results despite the tag being there. there is more work to do here, so it is top priority next week, and likely some maintenance to regen the bad caches
- .
- boring rewrites for multiple local file services:
- many users have asked for the ability, when multiple local file domains are available, to search multiple file domains at once. I spent time this week doing background work to support this, and a related concept of searching 'deleted' files, which is a current gap in the program and not always covered by 'all known files'. nothing significant changes, but almost all the file search code now works on n file domains rather than 1, but for now n=1 lmao
- made a new 'database search context' object to handle a virtualised but still simple and fast file 'location view' at the database level
- the primary file search call is converted to use this object. references to a single file service are replaced with the view or its components
- all duplicate file search code is moved to the new location search context
- searching by 'system:import time' now works over multiple domains, although it is a little muddled. in future, import time predicate will have an optional specific file service and do 'import time' vs 'deleted time'
- 'system:local' and 'system:not local' is adapted so it can still work fast with multiple file domains, sped up worst case 'local' time, and a new optimisation lets it run fast for 'deleted from local files' too
- sorting search files by import time is now only supported if the search domain is just one domain

next week

While I was working on database stuff this week, I discovered a problem with repository processing that meant files imported before a repository was added (or mappings reset, with the new 'reset by content type' function) would not get fully processed and may not show up in tag searches. I fixed half of that bug this morning, but there is more to do, so that's top priority.

Otherwise, it should be a 'medium size' job week. I really want to get note parsing in the downloader done, but that may take longer, so I'll just see what I can chip away at.
I had a great week, ending up focusing mostly on bug fixes. The 'jank line' tile artifact in the image viewer is fixed for all normal zooms, and I also fixed the long-time tag search issue I discovered last week. There's also some quality of life improvements, and OR searching is added to the Client API.

The release should be as normal tomorrow.

hey im trying to use the url downloader and it wont find parse this url e621.net/posts/2662369

where do I go or what do I do?  who knows how many downloads were affected
Not sure if this is a bug or if I'm missing something, but I screwed up and ran out of space on the disk I keep my DB on. It's not the first time this has happened, but this time it seems to have broken Hydrus somewhat.

It takes a long time to start now, and then it works slowly but normally for a while before it decides it's out of space again, at which point I can't make any changes, like deleting files to free up space.

I've used the moments where it does work to delete some of the bigger files to free up 25 GB. This is 5% of the entire disk, and although I think it's made it a little better, meaning it takes a bit longer for it to think it's run out of space, it hasn't actually fixed the problem.

Is there a way to fix this other than replacing the database file entirely and reimporting the nearly 500 GB worth of client files?
thumbnail of Capture.PNG
thumbnail of Capture.PNG
Capture PNG
(9.15 KB, 574x116)
I'm following what this message is telling me to do. I don't know how or when this popped up but I just noticed it when I looked at Hyrdus today. I'm on v434, Win10 19042.
I'd like to start this post off by saying that I'm using the AUR version of Hydrus and I know there isn't any real support for Linux builds in general, but I'm still posting this here in case someone else had the same problem, or possibly even knows what fucked up and how to fix it. Alright -

Dark mode seemed to just stop working for me one day. It looks exactly like the default white color scheme except the backdrop for image preview areas is the dark gray color that dark mode typically uses. I checked the "colours" in options and everything seems to be fine too (although I don't think you can change the color of the hydrus window itself through there anyway). This just kind of suddenly started happening after booting up Hydrus one day. It wasn't after updating or anything like that either.

https://youtube.com/watch?v=fLLKwIfs1NM
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v450/Hydrus.Network.450.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v450/Hydrus.Network.450.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v450/Hydrus.Network.450.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v450/Hydrus.Network.450.-.Linux.-.Executable.tar.gz

🎉🎉 MERRY 450! 🎉🎉

I had an ok week. Last week's experimental release went well, so I have polished that, and I otherwise caught up with a variety of small work.

If you sync with the PTR, update will take a couple of minutes.

all misc this week

So, the update storage change went ok! There were a couple of little sync bugs to clear up, but overall it works--tag repositories now track their processing progress by mappings, siblings and parents separately. You don't have to do anything, and this doesn't matter much for day to day work, but it allows for individualised reset and reprocessing. All 448 users will have their siblings and parents reset and reprocessed, which will take a couple of minutes to do on update, and about fifteen minutes on your next processing job to fill back in, and which should eliminate some bad siblings and parents due to years-old processing bugs that long term users have been dealing with (leaving only current bugs, which I am also working on). The reset will not delete any pending siblings or parents you have, so no worries if you have a bunch waiting to be uploaded.

Advanced Users: The PTR sibling and parent reset will however remove any siblings and parents you uploaded that were then denied by jannies (which your client would have added to itself anyway). Everyone is reset to a 'clean' sync with this change, so if you know you have a ton of surplus denied siblings you rely on, perhaps from years ago that we agreed I would deny on the PTR to help you hack in an overwrite in the old system, you might like to hold off updating and first figure out a PTR sibling/parent backup to a local service using tag migration.

I fixed some things with Mr Bones. His numbers are accurate to your 'my files' again, and he now talks about your total deleted files and also your earliest file import time. I divided the ugly growing stack of numbers into tabs, which I am sort of happy with, sort of not. People like to take screens of Mr Bones, but they have different preferences on what to show, so I may replace this with expand/collapse buttons or similar, so you can show everything if you want.

If you use the export files window to get a lot of files out, it now makes a progress popup. You can close the window while it is exporting and still see and cancel the job.

The Client API file search now supports file and tag domain selection (like the 'my files' and 'all known tags' buttons on a normal search page), and also file sort for searches. I know the Client API guys have been waiting on this, so with luck we should see some neat new search options in the Client API programs in the near future.
full list

- misc:
- when exporting files from the file export window, a cancellable popup job with progress updates is also created. if you close the window, you can still cancel the job from the popup
- fixed a crash bug in file export window
- system:num file relationships (duplicates) now correctly only returns files in the current file search domain (previously, it returned all files, including those previously deleted etc...)
- I rearranged some of the thumbnail menu file relationships actions menu. I'm not really happy with this, but a shuffle is easier than a full rework
- fixed the '4k' resolution label replacer, which was looking at 2060 height not 2160 by mistake
- the phash generation routine (part of the duplicates system, happens on image imports) now uses less memory and CPU for images with an alpha channel (pngs and still gifs), and if those images are taller or wider than 1:30 or 30:1, the phashes are also better quality
- the 'fill in subscription gap' popup button now correctly boots its created downloader when the action also opens a new downloader page. previously, due to overactive safety code, it would hang on 'pending' until a client restart. related similar 'start downloader after creating page' actions off drag and drop or client api should also be more reliable
- .
- repositories (also the various improvements in 449-experimental are folded in):
- fixed an issue with some 'force repository account refresh' code not kicking in immediately
- when a client sees repository update period change, it now recalculates the metadata next check time
- fixed a bug with the new repo sync where updates just added from additive sync were not being processed until client restart. related long-term buggy 'do we have this hash in updates?' and 'how many updates are there?' tests for update metadata are also fixed
- the experimental by-content-type repository reset from last week now leaves pending content in place
- the reset also now clears cached service info counts for files, tags, and mappings
- .
- client api:
- the /get_files/search_files command now takes six new parameters for file/tag domain selection and file sort type and order
- I wrote out some simple help and added some hacky unit tests for these new parameters. it needs another pass for potential bug fixes and readability/specificity (e.g. what does 'asc' for 'sort by ratio' mean?), but let me know how you get on anyway
- fixed the new system predicate parsing for system:hash with only one hash
- improved the url system predicate examples in client api documentation
- client api version is now 19
- .
- mr bones:
- mr bones now reports the correct numbers for your 'my files' again (and will continue to do so as multiple local file services are added)
- mr bones now reports total files deleted and their total size
- mr bones now reports your earliest recorded file import time
- mr bones now has separate tabs for different stats types. this neatly ditches the giant stack of numbers this was becoming, but I may revisit it. some people who take mr bones screens will prefer all the info in one easy shot, while I others I know would rather the 'viewing habits' stuff were not immediately there. maybe expanding boxes?
- fixed some mr bones layout
- .
- boring code cleanup:
- made a new base class for the different database modules to hold cursor and collect common administrative functions
- all database queries (about 1,200 of them) now go through a single location in the new class
- a new profile mode, 'query planner' mode, now prints query text and EXPLAIN QUERY PLAN lines to a new profile log. this is a new experimental thing, extremely spammy, but will help with diagnosing very unusually slow queries on individual clients (it'll most likely show up odd sqlite versions, weird data distributions, or un-analysed tables)
- updated a core function in 'all known files' mappings change autocomplete count adjustment. this seemed to have extremely bad worst case time, and I think it might have been giving some bad counts in unusual situations
next week

Next week is cleanup. The long term database breakup job has been going well, making code simpler and easier to expand, so I think more of that. I also started a new database profiling system this week, and I want to experiment with it a bit.
I had a great work week. I fixed a heap of bugs!

The release should be normal time tomorrow.

https://youtube.com/watch?v=VEwVAV3VPw4
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v448/Hydrus.Network.448.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v448/Hydrus.Network.448.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v448/Hydrus.Network.448.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v448/Hydrus.Network.448.-.Linux.-.Executable.tar.gz

Hey, I am sorry, Endchan has been down or not allowed login/posting the past couple weeks when I needed to get release out, so there was a gap.

I had an ok couple of weeks. I was pretty ill in the middle, but I got some good work done overall. .wav files are now supported, PSD files get thumbnails, vacuum returns, and the Client API allows much cleverer search.

client api

I have added some features to the Client API. It was more complicated than I expected, so I couldn't get everything I wanted done, but I think this is a decent step forward.

First off, the main 'search for files' routine now supports many system predicates. This is thanks to a user who wrote a great system predicate text parser a long time ago. I regret I am only catching up with his work now, since it works great. I expect to roll it into normal autocomplete typing as well--letting you type 'system:width<500' and actually getting the full predicate object in the results list to select.

If you are working with the Client API, please check out the extended help here:

https://hydrusnetwork.github.io/hydrus/help/client&#95;api.html#get&#95;files&#95;search&#95;files

There's a giant list of the current supported inputs. You'll just be submitting system predicates as text, and it handles the rest. Please note that this is a complicated system, and while I have plenty of unit tests and so on, if you discover predicates that should parse but are giving errors or any other jank behaviour, please let me know!

Next step here is to add file sort and file/tag domain.

Next there's a routine that lets you add files to arbitrary pages, just like a thumbnail drag and drop:

https://hydrusnetwork.github.io/hydrus/help/client&#95;api.html#manage&#95;pages&#95;add&#95;files

This is limited to currently open pages for now, but I will add a command to create an empty file page so you can implement an external file importer page.

misc

.wav files are now supported! They should work fine in mpv as well.

Simple PSD files now get thumbnails! It turns out FFMPEG can figure this out as long as the PSD isn't too complicated, so I've done it like for .swf files--if it works, the PSD gets a nice thumbnail, and if it doesn't it gets the file default icon stretched to the correct ratio. When you update, all existing PSDs will be queued for a thumbnail regen, so they should sort themselves out in the background.

Thanks to profiles users sent in, I optimised some database code. Repository processing and large file deletes should be a little faster. I had a good look at some slow session save profiles--having hundreds of thousands of URLs in downloader pages currently eats a ton of CPU during session autosave--but the solution will require two rounds of significant work.

Database vacuum returns as a manual job. I disabled this a month or so ago as it was always a rude sledgehammer that never actually achieved all that much. Now there is some UI under database->db maintenance->review vacuum data that shows each database file separately with their current free space (i.e. what a vacuum will recover), whether it looks like you have enough space to vacuum, an estimate of vacuum time, and then the option to vacuum on a per file basis. If you recently deleted the PTR, please check it out, as you may be able to recover a whole ton of disk space.

I fixed Mr Bones! I knew I'd typo somewhere with the file service rewrite two weeks ago, and he got it. I hadn't realised how popular he was, so I've added him to my weekly test suite--it shouldn't happen again.
1 replies omitted. Click to expand viewer
- in the downloader system, if a download object has any hashes, it now no longer consults urls for pre-import predictions. this saves a little time looking up urls and ensures that the logically stronger hashes take precedence over urls in all cases (previously, they only took precedence when a non-'looks new' status was found)
- fixed an ugly bug in manage tag siblings/parents where tags imported from clipboard or .txt were not being cleaned, so all sorts of garbage with capital letters or leading spaces could be entered. all pairs are now cleaned, and anything invalid skipped over
- the manage tag filter dialog now cleans all imported tag rules when using the 'import' button (issue #768)
- the manage tag filter dialog now allows you to export the current tag filter with the export button
- fixed the 'edit json parse rule' dialog layout so if you transition from a short display to a string match that has complicated controls, it should now expand properly to show them all
- I think I fixed an odd bug where when uploading pending mappings while more mappings were being added, the x/y progress could accurately but unhelpfully continually reset to 0/y, with an ever-decreasing y until it was equal to the value it had at start. y should now always grow
- hydrus servers now put their server header on a second header 'Hydrus-Server', which should allow them to be properly detectable through a proxy that overrides 'Server'
- optimised a critical call in the tag mappings update database routine. for a service with many siblings and parents, I estimate repository processing is 2-7% faster
- optimised the 'add/delete file' database routines in multiple ways, particularly when the file(s) have many deleted tags, and for the local file services, and when the client has multiple tag services
- brushed up a couple of system predicate texts--things like num_pixels to 'number of pixels'
- .
- boring database refactoring:
- repository update file tracking and service id normalisation is now pulled out to a new 'repositories' database module
- file maintenance tracking and database-level file info updates is now pulled out to a new 'files maintenance' database module
- analyse and vacuum tracking and information generation is now pulled out to a new 'db maintenance' database module
- moved more commands to the 'similar files' module
- the 'metadata regeneration' file maintenance job is now a little faster to save back to the database
- cleared out some defunct/bad database code related to these two modules
- misc code cleanup, particularly around the stuff I optimised this week

next week

Next week is a 'medium job' week. To clear out some long time legacy issues, I want to figure out an efficient way to reset and re-do repository processing just for siblings and parents. If that goes well, I'll put some more time into the Client API.
I had a mixed week. I completed a long delayed maintenance routine for repositories, letting them track tags, siblings, and parents processing separately, but it proved much more complicated than I expected, and while I am happy with the work, I have nothing else to show. Since the change also touches core areas of repository processing, I want to do a limited beta test before I roll it out to everyone.

The release should be normal time tomorrow, but it will be an experimental release, only recommended for advanced users.
https://youtube.com/watch?v=boFZ3cAws20
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v449/Hydrus.Network.449.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v449/Hydrus.Network.449.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v449/Hydrus.Network.449.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v449/Hydrus.Network.449.-.Linux.-.Executable.tar.gz

I had a mixed week. I was able to get a long-planned maintenance routine completed, but that's all I have to show. This is an experimental release, only for advanced users who want to help me test it.

repository processing tracking

Since I haven't got anything really fun this week, and this changes something delicate, I only want advanced users to check it out for now. If you have experience with the program, run a regular backup, sync with the PTR or another repository, and want to help me out, then please update this week and use your repository normally. Let me know if you run into any trouble. One thing I noticed just now is my IRL client didn't want to catch up to some final processing until I restarted it.

Note this update will delete your pending siblings and parents, so commit before you update! I'll make it so it doesn't do this next week.

So, repositories now track their 'processing' status more cleverly. Hit services->review services to see--now the 'definitions' part of an update is separated, and the different contents a repository has (just files for a file repo, but mappings, siblings, and parents for a tag repo) also have separate tracking and pause buttons. Most of the time you'll see everything at the same progress, but now the client can do independent reset, so 'clear all siblings and then reprocess them', and it doesn't have to nuke and work through all your mappings as well.

This sounds simple, but it turns out it touches a bunch of core systems, and many were old and dusty. I brushed everything up, maybe fixed some little bugs or lags along the way, and added some neat reprocess commands to the 'review services' panel. All siblings and parents will be reset this week--part of a long-time problem with non-determinant sibling/parent processing I have been trying to pin down with the PTR janitors--but doing this reset now just takes a couple of seconds and shouldn't take more than a minute to reprocess.

There are some secondary cool things here--users can potentially now sync with the PTR just for the siblings and parents. It is still a little inefficient, since you are getting the tens millions of definitions no matter what, but you can skip the 1.3 billion mappings if you want. I also feel more able to hang new tools off it like a tag filter (e.g. 'get all the creator tags from PTR, but nothing else') in future.

full list

- this is an experimental release! please do not use this unless you are an advanced user who has a good backup, syncs with a repository (e.g. the PTR), and would like to help me out. if this is you, I don't need you to do anything too special, just please use the client and repo as normal, downloading and uploading, and let me know if anything messes up during normal operation
- repository processing split:
- tl;dr: nothing has changed, you don't have to do anything. with luck, your PTR service is going to fix some bad siblings and parents over the next couple of days
- repositories now track what they have processed on a per-content basis. this gives some extra maintenance tools to, for instance, quickly reset and reprocess your ~150k tag siblings on the PTR without having to clear and reprocess all 1.3 billion mappings too
- in review services, you now see definition updates and all a repository's content types processing progress independently (just files for a file repo, but mappings, siblings, and parents for a tag repo). most of the time they will all be the same, but each can be paused separately. it is now possible (though not yet super efficient, since definitions still run 100%) to sync with the PTR and only grab siblings and parents by simply pausing mappings in review services
- I have also split the 'network' and 'processing' sync progress gauges and their buttons into separate boxes for clarity
- the 'fill in content gaps' maintenance job now lets you choose which content types to do it for
- also, a new 'reset content' maintenance job lets you choose to delete and reprocess by content type. the nuclear 'complete' reset is now really just for critical situations where definitions or database tables are irrevocably messed up
- all users have their siblings and parents processing reset this week. next time you have update processing, they'll come back in over about fifteen minutes, and with luck we'll wipe out some years-old legacy bugs and hopefully discover some info about the remaining bugs. most importantly, we can re-trigger this reprocess in just a few seconds to quickly test future fixes
- a variety of tests such as 'service is mostly caught up' are now careful just to test for the currently unpaused content types
- if you try to commit some content that is currently processing-paused, the client now says 'hey, sorry this is paused, I won't upload that stuff right now' but still upload everything else that isn't paused. this is a ' service is caught up' issue
- tag display sync, which does background work to make sure siblings and parents appear as they should, will now not run for a service if any of the services it relies on for siblings or parents is not process synced. when this happens, it is also shown on the tag display sync review panel. this stops big changes like the new sibling/parent reset from causing display sync to do a whole bunch of work before the service is ready and happy with what it has. with luck it will also smooth out new users' first PTR sync too
- clients now process the sub-updates of a repository update step in the order they were generated on the server, which _may_ fix some non-determinant update bugs we are trying to pin down
- all update processing tracking is overhauled. all related code and database access techniques have been brushed up and should use less CPU and fail more gracefully

next week

This work knocked me out. I had half hoped it would be a simple little thing, just splitting one x/y into multiple, but instead it spiralled out into ten different 'ah, but what about that?' and 'man, that's actually been running bad for ages'. Rather than kick out garbage on a core system, I decided to give it some proper time and do extra IRL testing. However, I am behind on messages, recent bug reports, other small work, and the Client API, so I'll now get to that.
I had an ok week. The update storage change last week went well, so that is polished and ready for everyone. I also caught up on some small fixes and quality of life and extended the Client API a little further.

The release should be normal time tomorrow.

https://youtube.com/watch?v=asojeparbK0
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v446/Hydrus.Network.446.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v446/Hydrus.Network.446.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v446/Hydrus.Network.446.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v446/Hydrus.Network.446.-.Linux.-.Executable.tar.gz

I had an ok week. The client does not have a huge changelog this week, but the server has a neat privacy improvement.

The PTR is going to be doing a heap of maintenance this week. It will be 'busy' a lot, approximately one hour of busy and then three hours free. Please bear with it, and if you have a million mappings to upload, I recommend you just give it a break and come back later. I am not totally sure how long it will take. Best case it is a day, worst case it might take four or five.

null account

tl;dr: The hydrus server is now even more anon. You don't have to do anything.

As the PTR has moved to multiple accounts, we've had several good discussions about privacy. Separate accounts, despite being anon, could potentially leave a fingerprint of preferences on the server. If the server were to fall into bad hands many years from now, someone could mine those records--maybe mixed with one time you casually said 'yeah, I added that sibling'--and perhaps derive something from it.

There is no technical need to remember which account uploaded what long term, so now all hydrus servers completely anonymise their content after a certain duration, default 90 days. A new non-useable 'null' account takes possession of files, tag mappings, siblings, or parents after the delay, letting the original uploaders be forgotten. Janitors will still have time to work on recent account-based problems, but the historical record works just like the old shared public account: all merged together.

I have updated the privacy document in the help to talk a little about it. As long as you do not tag your own face in pictures or something, I think we are pretty great now, especially if you use a VPN.

https://hydrusnetwork.github.io/hydrus/help/privacy.html#account&#95;history

It will take some time to anonymise the PTR or any other big server, as it has to go through its whole historical record to catch up. Please bear with it.
2 replies omitted. Click to expand viewer
 >>/1092/
Thanks for the reply. Great program you've made. It's really revealed that the biggest obstacle to organisation was always me. Even with hydrus I can't be fucked to tag most of my shit.
 >>/1103/
Yeah, I find it difficult too. The PTR is great, if you can devote the SSD space for it. But you also need to set up workflows and so on just to get through all your inbox. I really haven't figured out the answer there, and for a decade I guess I've slowly seen all my other queues turn infinite too--I add more films every month to my queue than I watch, more vidya than I can play, more music than I can listen to, and more little files through hydrus than I can process.

If everything goes tits up but I end up in a functioning bunker, I'll probably be set for twenty years and finally catch up, ha ha ha.
I had a great week. I reworked how file status is stored in the database, greatly accelerating many functions all over the program, particularly on very large clients. I also fixed some annoying bugs and added some quality of life.

The release should be as normal tomorrow.
 >>/1103/
the key to tagging is not to tag fully, but work in waves. 

mine is 
safe / suggestive / explicit - this is a hard gate so if it doesn't have this tag, its not going to the next tag section
real / drawn - another hard gate but its FAR to useful for me to not get this in on the first parse go
funny
favorite
tag further - this implies there is something special there, usually accompanied by favorite, that I need to pay attention to
art
and a few other 'series' gates

from here it gets more specialized. this method lets me tag 1 file every second or so and makes sure all files are 'new' files, so im not wondering about duplicates and shit like that. that said my method relies on me being done with duplicate parsing which is currently at 1.7 million files, so fuck me. 

my next method for mass culling rather then tagging is a 10 star system. 

with one of the updates I can progressively add ratings with a hotkey, so they start out at 0 star
they move up to 'seen it 0 star' and each time I 'see it' 1 star gets added, ands when files make it to 10 star, they move to archive. 

this method was devised due to duplicates takeing FAR to long for me to go through and I wanted to do something else then a/b files. this makes it very easy to go through things quick, and then with collections exclude what I have already seen. granted I have opinions on how that is handled and better ways to potentially do it, but its a good start.
 >>/1099/
I've been meaning to help out with the PTR. I guess this is a reminder. Is it ok to use tor as a proxy for the ptr? I haven't tried yet but is it blocked or are there any security concerns? I really want to start petitioning these random images that are coming up for some of my tag searches.

https://youtube.com/watch?v=ulvvQHD4184
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v445/Hydrus.Network.445.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v445/Hydrus.Network.445.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v445/Hydrus.Network.445.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v445/Hydrus.Network.445.-.Linux.-.Executable.tar.gz

I had a great week mostly working on optimisations and cleanup. A big busy client running a lot of importers should be a little snappier today.

optimisations

Several users have had bad UI hangs recently, sometimes for several seconds. It is correlated with running many downloaders at once, so with their help I gathered some profiles of what was going on and trimmed and rearranged some of the ways downloaders and file imports work this week. There is now less stress on the database when lots of things are going on at once, and all the code here is a little more sensible for future improvements. I do not think I have fixed the hangs, but they may be less bad overall, or the hang may have been pushed to a specific trigger like file loads or similar.

So there is still more to do. The main problem I believe is that I designed the latest version of the downloader engine before we even had multiple downloaders per page. An assumed max of about twenty download queues is baked into the system, whereas many users may have a hundred or more sitting around, sometimes finished/paused, but in the current system each still taking up a little overhead CPU on certain update calls. A complete overhaul of this system is long overdue but will be a large job, so I'm going to focus on chipping away at the worst offenders in the meantime.

As a result, I have improved some of the profiling code. The 'callto' profile mode now records the UI-side of background jobs (when they publish their results, usually), and the 'review threads' debug dialog now shows detailed information on the newer job scheduler system, which I believe is being overwhelmed by micro downloader jobs in heavy clients. I hope these will help as I continue working with the users who have had trouble, so please let me know how you get on this week and we'll give it another round.

the rest

I fixed some crazy add/delete logic bugs in the filename tagging dialog and its 'tags just for selected files' list. Tag removes will stick better and work more precisely on the current selection.

If you upload tags to the PTR and notice some lag after it finishes, this should be fixed now. A safety routine that verifies everything is uploaded and counted correct was not working efficiently.

I fixed viewing extremely small images (like 1x1) in the media viewer. The new tiled renderer had a problem with zooms greater than 76800%, ha ha ha.

A bunch of sites with weird encodings (mostly old or japanese) should now work in the downloader system.

Added a link, https://github.com/GoAwayNow/Iwara-Hydrus, to Iwara-Hydrus, a userscript to simplify sending Iwara videos to Hydrus Network, to the Client API help.

If you are a Windows user, you should be able to run the client if it is installed on a network location again. This broke around v439, when we moved to the new github build. It was a build issue with some new libraries.
full list

- misc:
- fixed some weird bugs on the pathname tagging dialog related to removal and re-adding of tags with its 'tags just for selected files' list. previously, in some circumstances, all selected paths could accidentally share the same list of tags, so further edits on a subset selection could affect the entire former selection
- furthermore, removing a tag from that list when the current path selection has differing tags should now successfully just remove that tag and not accidentally add anything
- if your client has a pending menu with 'sticky' small tag count that does not seem to clear, the client now tries to recognise a specific miscount cause for this situation and gives you a little popup with instructions on the correct maintenance routine to fix it
- when pending upload ends, it is now more careful about when it clears the pending count. this is a safety routine, but it not always needed
- when pending count is recalculated from source, it now uses the older method of counting table rows again. the new 'optimised' count, which works great for current mappings, was working relatively very slow for pending count for large services like the PTR
- fixed rendering images at >76800% zoom (usually 1x1 pixels in the media viewer), which had broke with the tile renderer
- improved the serialised png load fix from last week--it now covers more situations
- added a link, https://github.com/GoAwayNow/Iwara-Hydrus, to Iwara-Hydrus, a userscript to simplify sending Iwara videos to Hydrus Network, to the client api help
- it should now again be possible to run the client on Windows when the exe is in a network location. it was a build issue related to modern versions of pyinstaller and shiboken2
- thanks to a user's help, the UPnPc executable discoverer now searches your PATH, and also searches for 'upnpc' executable name as a possible alternative on linux and macOS
- also thanks to a user, the test script process now exits with code 1 if the test is not OK
- .
- optimisations:
- when a db job is reading data, if that db job happens to fall on a transaction boundary, the result is now returned before the transaction is committed. this should reduce random job lag when the client is busy
- greatly reduced the amount of database time it takes to check if a file is 'already in db'. the db lookup here is pretty much always less than a millisecond, but the program double-checks against your actual file store (so it can neatly and silently fill in missing files with regular imports), however on an HDD with a couple million files, this could often be a 20ms request! (some user profiles I saw were 200ms!!! I presume this was high latency drives, and/or NAS storage, that was also very busy at the time). since many download queues will have bursts of a page or more of 'already in db' results (from url or hash lookups), this is why they typically only run 30-50 import items a second these days, and until this week, why this situation was blatting the db so hard. the path existence disk request is pulled out of precious db time, allowing other jobs to do other db work while the importer can wait for disk I/O on its thread. I suspect the key to getting the 20ms down to 8ms will be future granulation of the file store (more than 256 folders when you have more than x files per folder, etc...), which I have plans for. I know this change will de-clunk db access when a lot of importers are working, but we'll see this week if the queues actually process a little faster since they can now do file presence checks in parallel and with luck the OS/disk will order their I/O requests cleverly. it may or may not relieve the UI hangs some people have seen, but if these checks are causing trouble it should expose the next bottleneck
- optimised a small test that checks if a single tag is in the parent/sibling system, typically before adding tags to a file (and hence sometimes spammed when downloaders were working). there was a now-unneeded safety check in here that I believe was throwing off the query planner in some situations
- the 'review threads' debug UI now has two new tabs for the job schedulers. I will be working with UI-lag-experiencing users in future to see where the biggest problems are here. I suspect part of it will overhead from downloader thread spam, which I have more plans for
- all jobs that threads schedule on main UI time are now profiled in 'callto' profile mode
- .
- site encoding fixes:
- fixed a problem with webpages that report an encoding for which there is no available decoder. This error is now caught properly, and if 'chardet' is available to provide a supported encoding, it now steps in fixes things automatically. for most users, this fixes japanese sites that report their encoding as "Windows-31J", which seems to be a synonym for Shift-JIS. the 'non-failing unicode decode' function here is also now better at not failing, ha ha, and it delivers richer error descriptions when all attempts to decode are non-successful
- fixed a problem detecting and decoding webpages with no specified encoding (which defaults to windows-1252 and/or ISO-8859-1 in some weird internet standards thing) using chardet
- if chardet is not available and all else fails, windows-1252 is now attempted as a last resort
- added chardet presence to help->about. requests needs it atm so you likely definitely have it, but I'll make it specific in requirements.txt and expand info about it in future
- .
- boring code cleanup:
- refactored the base file import job to its own file
- client import options are moved to a new submodule, and file, tag, and the future note import options are refactored to their own files
- wrote a new object to handle current import file status in a better way than the old 'toss a tuple around' method
- implemented this file import status across most of the import pipeline and cleaned up a heap of import status, hash, mime, and note handling. rarely do downloaders now inspect raw file import status directly--they just ask the import and status object what they think should happen next based on current file import options etc...
- a url file import's pre-import status urls are now tested main url first, file url second, then associable urls (previously it was pseudorandom)
- a file import's pre-import status hashes are now tested sha256 first if that is available (previously it was pseudorandom). this probably doesn't matter 99.998% of the time, but maybe hitting 'try again' on a watcher import that failed on a previous boot and also had a dodgy hash parser, it might
- misc pre-import status prediction logic cleanup, particularly when multiple urls disagree on status and 'exclude previously deleted' is _unchecked_
- when a hash gives a file pre-import status, the import note now records which hash type it was
- pulled the 'already in db but doesn't actually exist on disk' pre-import status check out of the db, fixing a long-time ugly file manager call and reducing db lock load significantly
- updated a host of hacky file import unit tests to less hacky versions with the new status object
- all scheduled jobs now print better information about themselves in debug code
next week

Next week is a 'medium size job' week. I would like to do some server work, particularly writing the 'null account' that will inherit all content ownership after a certain period, completely anonymising history and improving long-term privacy, and then see if I can whack away at some janitor workflow improvements.
I had an ok week. I mostly focused on a new privacy improvement for repositories like the PTR, but otherwise I fixed a few bugs and improved some downloader UI.

The release should be as normal tomorrow.

https://youtube.com/watch?v=0OmOQs0ziSU
This whole vid is great. Watching a ton of this stuff was a decent whack of my vacation.
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v444/Hydrus.Network.444.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v444/Hydrus.Network.444.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v444/Hydrus.Network.444.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v444/Hydrus.Network.444.-.Linux.-.Executable.tar.gz

I had an ok week getting back to speed. There's some new help regarding the new PTR accounts and I fixed some bugs.

accounts and help

If you use the PTR, you may have seen a note appear on the public account that it is now 'read only' and can now only download updates. If you would like to upload tags, siblings, or parents, please generate your own account either by clicking the popup button you get when trying to upload, or just hit up the PTR on services->manage services and click the easy button there. This pretty much completes the long project me and the guys running the PTR have been working on: moving all uploading users to their own still-anon-but-seperate accounts so the janitors can work through groups of petitions and undo future mistakes without it all being merged together in one account. I have given both of these pages a complete pass to represent the changes and updated screenshots etc... to be more modern:

https://hydrusnetwork.github.io/hydrus/help/access&#95;keys.html
https://hydrusnetwork.github.io/hydrus/help/privacy.html

The privacy page now compiles the various conversations I have had with users over time and most recently about this account change, and my current understanding of exactly how private a hydrus repo is (really good), and what you can do to make it almost perfect, even against esoteric problems or an untrustworthy administrator (basically using a VPN is always a good idea when you are doing anything fun online, and don't tag your private photos on the PTR with your real name lol). I have tried to be comprehensive and even tried to imagine some advanced future problems, so it is a long read, but I'll try to keep updating all my thoughts to there so it can be a good reference point in future. If you are interested in this stuff, please read it and let me know what I have missed or if you have any other ideas!

Also, if you are a Linux user and get OOM (out of memory) crashes, please check the new thorough user-written guide linked in this new 'running' section here:

https://hydrusnetwork.github.io/hydrus/help/getting&#95;started&#95;installing.html#running
https://hydrusnetwork.github.io/hydrus/help/Fixing&#95;Hydrus&#95;Random&#95;Crashes&#95;Under&#95;Linux.md

all misc otherwise this week

I rewrote a heap of thread-UI object interactions that should improve stability on some Linux flavours and also get rid of the 'QtDeadWindow' errors some users have seen (usually in manage tags).

The new popup message button that fills in subscription gaps would sometimes not assign the correct downloader (usually when the button was opening a new page). This should be fixed!

A variety of error reporting is more reliable and presents better text.

An odd error when loading the downloader .png files is fixed, so if you were unable to import new downloaders in the past few weeks, please try again and let me know how you get on!
full list

- gave the 'access keys' and 'privacy' help pages a complete pass. the access keys section talks about the read-only shared key, and how to generate you own account, and the privacy section now compiles, as comprehensively as I could, our various discussions about multiple accounts, what you shouldn't upload to the PTR (basically your own name lol), self-signed https certificates, and what information is actually stored on an account
- expanded the 'getting started - installing' help page with a 'how to run the client' section, including bundling the excellent Linux virtual memory guide written by a user
- fixed the new 'fill in subscription with gap downloader' button, which was initialising with the wrong downloader at times (usually on the first gap downloader opened, when it opened a new page with it)
- you can now set 'all known files' for the tag autocomplete in 'write' contexts (e.g. manage tags dialog) when not in advanced mode
- cleaned up how a variety of delayed UI calls are registered and present information about themselves. every UI job now has a nice human name for debug purposes. this should improve program stability and clear some odd rare errors when closing some dialogs (this mostly affected certain linux users)
- when an asynchronous UI job fails with a dead window, or if fails to publish to its window for a non-dead reason and then the window dies before that failure returns, the error handling code now catches and silences the error. an example of this would be clicking 'refresh account' on review services, then closing the window before the lagging job raises 'connection failure'
- when windows are rescued from off screen, their frame key is now stated in the popup note
- if your version of OpenCV is unable to load PNG files, your client should now be able to load serialised object PNGs (like those in the downloader system) correctly (the same PIL fallback for regular media files now works for deserialisation too)
- the hydrus log path is finally month-zero-padded, ha ha ha
- misc cleanup and label fixes

next week

I got caught up in some behind the scenes admin work this week (mostly fun updating my IDE to python 3.8), and the help docs, which I really wanted to fill out, so I regret my actual changelog is so light. Should be back to full steam next week. I'll hammer away at the reports and profiles I have of lagging UI and get back to grinding at multiple local file services.
Does the watcher support watching twitter profiles? If not what do you recommend for scraping twitter.
 >>/1091/
Try the 'subscriptions' system:

https://hydrusnetwork.github.io/hydrus/help/getting&#95;started&#95;subscriptions.html

For twitter, which can be pretty quick, I recommend setting the 'checker options' on the subscription so it checks at least once a day.
I had a great week mostly working on optimisation and cleanup. Busy clients should be a bit snappier. I also fixed some bugs!

The release should be as normal tomorrow.

https://youtube.com/watch?v=NgYIIPszZjA
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v443/Hydrus.Network.443.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v443/Hydrus.Network.443.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v443-macos/Hydrus.Network.443-macos.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v443/Hydrus.Network.443.-.Linux.-.Executable.tar.gz

I had a great week doing nice cleanup and quality of life work.

Hey, we had a problem getting the macOS release to build this week. The macOS link above goes to a build using a simpler and faster method. It should work fine, but please let me know if you have any trouble. As always, back up before you update!

highlights

Popup messages can now launch complex jobs from a button. The first I've added is when a subscription hits its 'periodic' file limit. The situation itself is now better explained, and a button on the popup will create a new downloader page with the specific query set up with an appropriate file limit to fill in the gap. The second is if you try to upload some content to a repository that your account does not have permission for (this is affecting sibling- and parent-uploading PTR users as the shared public account is changing), the popup message that talks about the issue now has a button that takes you straight to the manage services panel for the service and starts automatic account creation.

Subs should now be more careful about determining when they have 'caught up' to a previous sync. Small initial file limits are respected more, and the 'caught up' check is now more precise with sites that can give multiple files per URL or very large gallery pages.

I gave options->speed and memory a full pass. The layout is less crushed and has more explanation, the options all apply without needing a client restart, and the new, previously hardcoded cache/prefetch thresholds are now exposed and explained. There's a neat thing that gives an example resolution of what will be cached or prefetched, like 'about a 7,245x4,075 image', that changes as you fiddle with the controls.

The client has recently had worse UI lag. After working with some users, the biggest problems seemed to come in a session with lots of downloaders. I traced the cause of the lag and believe I have eliminated it. If you have had lag recently, a second or two every now and then, please let me know how things are now.

If you use the Client API a lot while the client is minimised, you can now have it explicitly prohibit 'idle mode' while it is working under options->maintenance and processing.
full list

- quality of life:
- when subscriptions hit their 'periodic file limit', which has always been an overly technical term, the popup message now explains the situation in better language. it also now provides a button to automatically fill in the gap via a new gallery downloader page called 'subscription gap downloaders' that gets the query with a file limit five times the size of the sub's periodic download limit
- I rewrote the logic behind the 'small initial sync, larger periodic sync' detection in subscription sync, improving url counting and reliability through the third, fourth, fifth etc... sync, and then generalised the test to also work without fixed file limits and for large-gallery sites like pixiv, and any site that has URLs that often produce multiple files per URL. essentially, subs now have a nice test for appropriate times to stop url-adding part way through a page (typically, a sub will otherwise always add everything up to the end of a page, in order to catch late-tagged files that have appeared out of order, but if this is done too eagerly, some types of subs perform inefficiently)
- this matters for PTR accounts: if your repository account does not have permissions to upload something you have pending, the popup message talking about this now hangs around for longer (120 seconds), explains the issue better, and has a button that will take you directly to the _manage services_ panel for the service and will hit up 'check for auto-account creation'
- in _manage services_, whenever you change the credentials (host, port, or access key) on a restricted service, that service now resets its account to unknown and flags for a swift account re-fetch. this should solve some annoying 'sorry, please hit refresh account in _review services_ to fix that manually' problems
- a new option in maintenance and processing allows you to disable idle mode if the client api has had a request in the past x minutes. it defaults disabled
- an important improvement to the main JobScheduler object, which farms out a variety of small fast jobs, now massively reduces Add-Job latency when the queue is very busy. when you have a bunch of downloaders working in the background, the UI should have much less lag now
- the _options->speed and memory_ page has a full pass. the thumbnail, image, and image tile caches now have their own sections, there is some more help text, and the new but previously hardcoded 10%/25% cache and prefetch limits are now settable and have dynamic guidance text that says 'about a 7,245x4,075 image' as image cache options change
- all the cache options on this page now apply instantly on dialog ok. no more client restart required!
- .
- other stuff, mostly specific niche work:
- last week's v441->442 update now has a pre-run check for free disk space. users with large sessions may need 10GB or more of free space to do the conversion, and this was not being checked. I will now try to integrate similar checks into all future large updates
- fixed last week's yandere post parser link update--the post url class should move from legacy moebooru to the new yandere parser correctly
- the big maintenance tasks of duplicate file potentials search and repository processing will now take longer breaks if the database is busy or their work is otherwise taking a long time. if the client is cluttered with work, they shouldn't accidentally lag out other areas of the program so much
- label update on ipfs service management panel: the server now reports 'nocopy is available' rather than 'nocopy is enabled'
- label update on shortcut: 'open a new page: search page' is now '...: choose a page'
- fixed the little info message dialog when clicking on the page weight label menu item on the 'pages' menu
- 'database is complicated' menu label is updated to 'database is stored in multiple locations'
- _options->gui pages->controls_ now has a little explanatory text about autocomplete dropdowns and some tooltips
- migrate database dialog has some red warning text up top and a small layout and label text pass. the 'portable?' is now 'beneath db?'
- the repositery hash_id and tag_id normalisation routines have two improvements: the error now shows specific service_ids that failed to lookup, and the mass-service_hash_id lookup now handles the situation where a hash_id is mapped by more than one service_id
- repository definition reprocessing now corrects bad service_id rows, which will better heal clients that previously processed bad data
- the client api and server in general should be better about giving 404s on certain sorts of missing files (it could dump out with 500 in some cases before)
- it isn't perfect by any means, but the autocomplete dropdown should be a _little_ better about hiding itself in float mode if the parent text input box is scrolled off screen
- reduced some lag in image neighbour precache when the client is very busy
- .
- boring code cleanup:
- removed old job status 'begin' handling, as it was never really used. jobs now start at creation
- job titles, tracebacks, and network jobs are now get/set in a nicer way
- jobs can now store arbitrary labelled callable commands, which in a popup message becomes a labelled button
- added some user callable button tests to the 'make some popups' debug job
- file import queues now have the ability to discern 'master' Post URLs from those that were created in multi-file parsing
- wrote the behind the scenes guts to create a new downloader page programmatically and start a subscription 'gap' query download
- cleaned up how different timestamps are tracked in the main controller

next week

I am now on vacation for a week. I'm going to play vidya, shitpost the limited E3, listen to some long music, and sort out some IRL stuff.

v444 should therefore be on the 23rd. I'll do some more cleanup work and push on multiple local file services.

Thank you for your support!
I had an ok week getting back to speed. I got caught up in some admin and help doc work, so my changelog is a little light, but I was able to clear out some bugs and other annoyances.

The release should be as normal tomorrow.

https://youtube.com/watch?v=bpEFn3MFyfA
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v442/Hydrus.Network.442.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v442/Hydrus.Network.442.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v442/Hydrus.Network.442.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v442/Hydrus.Network.442.-.Linux.-.Executable.tar.gz

I had a great week. An important part of GUI Sessions is overhauled, which should save a lot of hard drive time for larger clients.

gui sessions

I always encourage a backup before you update, but this week it matters more than normal. If you have a client with large sessions with many important things set up, make sure you have a backup done before you update! I feel good about the code, and I try to save data on various failures, but if your situation gives errors for an unforeseen reason, having the backup ready reduces headaches all around!

Like the subscriptions and network objects breakups I've done in the past year, I 'broke up' the monolithic GUI Session object this week. Now, when your session has changes, only those pages that have changed will be saved, saving a ton of CPU and HDD write I/O. Furthermore, sessions that share duplicate pages (this happens all the time with session backups), can now share that stored page, saving a bunch of hard drive space too. Like with subscriptions, some users are pushing multiple gigabytes of session storage total, so there is a good amount of work to save here.

You don't have to do anything here. Everything works the same on the front end, and all your existing sessions will be converted on update. Your client should be a little less laggy at times, and client shutdown should be a bit faster.

If any of your old sessions fail to load or convert, a backup will be made so we can check it out later. Let me know if you have any trouble!

Advanced stuff:

Another benefit is the old limit of 'sessions fail to save at about 500k session weight' now applies to pages individually. Please don't immediately try to nuke your sessions with five million new things, but if you do end up with a big session, let me know how other performance works out for you. Now this bottleneck is gone, we'll start hitting new ones. I believe the next biggest vulnerability is thread starvation with many simultaneous downloaders, so again please don't paste-spam a hundred now queries (for now).

If you have been tracking session weight (under the pages menu), I am rebalancing the weights. Before, the weight was file = 1, URL = 1, but after all our research into this, I am setting it to file = 1, URL = 20. In general, I think a page will fail to save at the new weight of about 10 million. If you are in advanced mode, you can now see each page's weight on page tab right-clicks. Let's get a new feeling for IRL distribution here, and we can aim for the next optimisation (I suspect it'll eventually be a downloader-page breakup, storing every query or watcher as a separate object). Since URLs seem to be the real killer, too, see if you can spread bigger downloads across multiple download pages and try to clear out larger completed queries when you can.
the rest

I did a bunch of little stuff--check the changelog if you are interested.

I have also turned off the interval VACUUM maintenance and hidden the manual task for now. This was proving less and less useful in these days of huge database files, so I will bring it back in future on a per-file basis with some UI and more specific database metadata.

EDIT: Thanks to a user submission, yande.re post parser is updated to pull tags correctly if you are logged in. I hoped my update code would move the link over from the old parser correct, but it did not. I'll fix this for next week, but if you download from yande.re while logged in, please hit network->downloader components->manage url class links and move 'yande.re file page' from moebooru to 'yande.re post page parser'.

We fixed a couple more problems with the new builds--the Linux and Windows extract builds have their surplus 'ubuntu'/'windows' directories removed, and the Linux executables should have correct permissions again. Sorry for the trouble!

And after some tests, we removed the .py files and the source from the builds. I long-believed it was possible to run the program from source beside the executables, but it seems I was mistaken. Unless you are running the build-adjacent source pretty much on the same machine you built on (as my tests years ago were), you get dll conflicts all over the place. If you want to run from source, just extract the source proper in its own fresh directory. I've also fleshed out the 'running from source' help beyond setting up the environment to talk more about the actual downloading and running of the program. I'll continue work here and hope to roll out some easy one-and-done setup scripts to automate the whole thing.

full list

- gui sessions:
- gui sessions are no longer a monolithic object! now, each page is stored in the database separately, and when a session saves, only those pages that have had changes since the last save are written to db. this will massively reduce long-term HDD writes for clients with large sessions and generally reduce lag during session save intervals
- the new gui sessions are resilient against database damage--if a page fails to load, or is missing from the new store, its information will be recorded and saved, but the rest of the session will load
- the new page storage can now be shared across sessions. multiple backups of a session that use the same page now point to the same record, which massively reduces the size of client.db for large-sessioned clients
- your existing sessions and their backups will obviously be converted to the new system on update. if any fail to load or convert, a backup of the original object will be written to your database directory. the conversion shouldn't take more than a minute or two
- the old max-object limit at which a session would fail to save was around 10M files and/or 500k urls total. it equated to a saved object of larger than 1Gb, which hit an internal SQLite limit. sessions overall now have no storage limit, but individual pages now inherit the old limit. Please do not hurry to try to test this out with giganto pages. if you want to make do a heap of large long-term downloaders, please spread the job across several pages
- it seems URLs were the real killer here, so I am rebalancing it so URLs now count for 20 weight each. the weight limit at which point a _page_ will now fail to save, and the client will start generally moaning at you for the whole session (which can be turned off in the options), is therefore raised to 10M. most of the checks are still session-wide for now, but I will do more work here in future
- if you are in advanced mode, then each page now gives its weight (including combined weight for 'page of pages') from its tab right-click menu. with the new URL weight, let's get a new sense of where the memory is actually hanging around IRL
- the page and session objects are now more healthily plugged into my serialisation system, so it should be much easier to update them in future (e.g. adding memory for tag sort or current file selection)
- .
- the rest:
- when subscriptions die, the little reporting popup now includes the death file velocity ('it found fewer than 1 files in the last 90 days' etc...)		
- the client no longer does vacuums automatically in idle time, and the soft/full maintenance action is removed. as average database size has grown, this old maintenance function has increasingly proved more trouble than it is worth. it will return in future as a per-file thing, with better information to the user on past vacuums and empty pages and estimates on duration to completion, and perhaps some database interrupt tech so it can be cancelled. if you really want to do a vacuum for now, do it outside the program through a SQLite intepreter on the files separately
- thanks to a user submission, a yande.re post parser is added that should grab tags correct if you are logged in. the existing moebooru post parser default has its yande.re example url removed, so the url_class-parser link should move over on update
- for file repositories, the client will not try to sync thumbnails until the repository store counts as 'caught up' (on a busy repo, it was trying to pull thumbs that had been deleted 'in the future'). furthermore, a 404 error due a thumb being pulled out of sync will no longer print a load of error info to the log. more work will be needed here in future
- I fixed another stupid IPFS pin-commit bug, sorry for the trouble! (issue #894)
- some maintenance-triggered file delete actions are now better about saving a good attached file delition reason
- when the file maintenance manager does a popup with a lot of thumbnail or file integrity checks, the 'num thumbs regenned/files missing or invalid' number is now preserved through the batches of 256 jobs
- thoroughly tested and brushed up the 'check for missing/invalid files' maintenance code, particularly in relation to its automatic triggering after a repository processing problem, but I still could not figure out specifically why it is not working for some users. we will have to investigate and try some more things
- fixed a typo in client api help regarding the 'service_names_to_statuses_to_display_tags' variable name (I had 'displayed' before, which is incorrect)
- .
- build fixes:
- fixed the new Linux and Windows extract builds being tucked into a little 'ubuntu'/'windows' subfolder, sorry for the trouble! They should both now have the same (note Caps) 'Hydrus Network' as their first directory
- fixed the new Linux build having borked permissions on the executables, sorry for the trouble!
- since I fixed the urllib3 problem we had with serialised sessions and Retry objects, I removed it from the requirements.txts. now 'requests' can pull what it likes
- after testing it with the new build, it looks like I was mistaken years ago that anyone could run hydrus from source when inside a 'built' release (due to dll conflicts in CWD vs your python install). maybe this is now only true in py3 where dll loading is a little different, but it was likely always true and my old tests only ever worked because I was in the same/so-similar environment so the dlls were not conflicting. in any case the builds no longer include the .py/.pyw files and the 'hydrus' source folder, since it just doesn't seem to work. if you want to run from source, grab the actual source release in a fresh, non-conflicting directory. I've updated the help regarding this, sorry for any trouble or confusion you have ever run into here
- updated the running from source document to talk more about actually getting the source and fleshed out the info about running the scripts
- misc boring refactoring and db updates:
- created a new 'pages' gui module and moved Pages, Thumbs, Sort/Collect widgets, Management panel, and the new split Session code into it
- wrote new container objects for sessions, notebook pages, and media pages, and wrote a new hash-based data object for a media page's management info and file list
- added a table to the database for storing serialised objects by their hash, and updated the load/save code to work with the new session objects and manage shared page data in the hashed storage
- a new maintenance routine checks which hashed serialisables are still needed by master containers and deletes the orphans. it can be manually fired from the _database->maintenance_ menu. this routine otherwise runs just after boot and then every 24 hours or every 512MB of new hashed serialisables added, whichever comes first
- management controllers now discard the random per-session 'page key' from their serialised key lookup, meaning they serialise the same across sessions (making the above hash-page stuff work better!)
- improved a bunch of access and error code around serialised object load/save
- improved a heap of session code all over
- improved serialised object hashing code

next week

I have one more week of work before my vacation. There's a ton of little jobs I have been putting off--checking new downloaders users sent in, some more help docs to work on, and magically growing multi-column list dialogs--as well as emails and other messages I haven't got to. I'll try to tidy up those loose ends as best I can before I take my break. I'll also deal with any problems with these new GUI Sessions.
excellent update hydev, all of my previous stutters are gone! thank you very much!
I had a great week working on small quality of life issues. A couple of bugs are fixed, some UI lag is reduced, and I worked on some layout too. Just a mix of cleanup before my vacation next week.

I have some unavoidable IRL tomorrow, so the release may be a bit later than usual.

 >>/1083/
Great, thanks for letting me know!

https://youtube.com/watch?v=EJLNLWv-nmM
windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v441/Hydrus.Network.441.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v441/Hydrus.Network.441.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v441/Hydrus.Network.441.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v441/Hydrus.Network.441.-.Linux.-.Executable.tar.gz

I had an ok week. Not as much as I wanted, but there are some nice Client API improvements.

all misc this week

The test builds from last week seem to work ok, so they are now master. The built clients now use Python 3.8, and the security libraries (like OpenSSL) are all much newer--and will reliably stay up to date in future--so a whole bunch of things across the client should have slightly better performance. There are no special install instructions, they seem to work on an existing install just as normal. Let me know if you do run into any problems!

I fixed some more bad tiles calculations for the new tiled image renderer. Some files that seem to have little black lines on an edge at some zooms, or previews that just turn up black, should be fixed! Error reporting is also nicer.

The Client API can now do a couple more things. Particularly, it can now set your client's global User-Agent, which should help fix some difficult CDN and login problems in future. Please watch this space.

For advanced users, if you have help->advanced mode on, then setting a namespace file sort now allows you to choose which 'tag context' the sort works on. If you hide certain tags in single or multiple media view (as set in tags->manage tag display and search), then those hidden tags will not count for the sort. This is obviously advanced, so if you hadn't thought of it, you can just set 'display tags' to keep 'normal' behaviour.

full list

- misc:
- after successful testing, all the master builds are now made on github rather than my home dev situation. the clients now work off python 3.8, and several security libraries (e.g. OpenSSL) are now always going to be latest, so there should be several quiet performance and reliability improvements across the program. there are no special install instructions--normal update seems to go fine--but let me know if you do have any trouble. big thanks to the user who did the leg work on developing the workflow build scripts here
- if you are in advanced mode, namespace file sorting now allows you to set the 'tag display context' on which it will sort. this appears as a new menu button or a button list selection dialog wherever you edit namespace file sorts. if you are not in advanced mode, the default is the 'display tags' I switched to last week (i.e. before any tags are hidden by your tag display options)
- namespace sort has some related code cleanup. the 'defaults' object is updated and moved to the newer options object
- the new tiled renderer now checks for rounding errors in zoom calc, which in some cases was giving a single extra (non-existing) native pixel row or column on rightmost or bottommost tile samples
- the new tiled renderer now double-checks clip regions for validity before attempting to crop
- improved the reported error information when a tile fails to render
- when pasting an uneven number of tags into manage siblings/parents, the error is now a nicer popup dialog. I'm pursuing a related error here--if you get this a bunch, please let me know what more info you discover
- when repositories fail to fetch the update hashes to process, they now force a metadata resync. any processing error should force a metadata resync now
- added a default url class for the new pixiv _artist_ page format
- fixed a recent typo bug with ipfs pinning
- client api additions:
- the client api has a new /manage_headers/set_user_agent call, which is a simple hack for now for external programs to set the 'Global' User-Agent. it should allow for some CloudFlare solutions when just copying cookies is not enough
- the client api has a new /get_services call, which talks about more services and also exposes service_keys for the first time, which are likely to be useful in future. check out the help for an example. the old /add_tags/get_tag_services call is now deprecated, please move to the new call
- the client api /version call now responds with 'hydrus_version' as well, which this week will be 441
- the client api now has a semi-experimental /manage_database/lock system, just like the server's. a new 'manage database' permission is added for this. don't play around with this system idly.
- the client api should now support sha256 hash parameters if they start with a type prefix like 'sha256:0123789abcdef...'
- the client and server's database lock commands now wait up to five seconds for the database to finish disconnecting to respond
- expanded client api unit tests to cover the above
- the client api version is now 17
- .
- boring multiple local file services work:
- the main search object now stores the file domain using a new 'location context' object that will in future hold multiple file services and can say whether we should search files currently in a domain, or those once deleted from it. a variety of back-end search code has been updated to deal with this more flexible situation
- removed more static references to the single 'my files' domain in db and related code. in a couple places, like mr. bones, it now fetches 'all local files', but this will likely be updated in future to a new umbrella 'all non-trash, non-repo-update-files local files' service

next week

I've had some real trouble keeping up recently, but that's ok. A bunch of it is out of my control, so I'll keep pushing anyway. Next week is due to be a 'medium' job week, and I would like to break up the gui session object into smaller pieces. Instead of saving the whole thing, it'll track and save and share individual pages. This will greatly reduce the random CPU lag and HDD use on any client with a large session, let crazy users to store more than 500,000 files in a session at once, and allow us to save changes more often. Basically the same improvement I made to subscriptions and the network objects in the last year, but for gui sessions.

I'm due to take my vacation week in two weeks, so I'll aim to have a simple 'clean' release week after next.
I had a great week. I succeeded in overhauling the client's GUI sessions, greatly reducing the storage and write I/O required for sessions. This particularly benefits clients that have sessions storing many files or URLs.

The release should be as normal tomorrow.

Post(s) action:


Moderation Help
Scope:
Duration: Days

Ban Type:


New Thread
Max 20 files0 B total
Refresh