- Endchan Magrathea

Anonymous
6/30/2021 22:40:00 No. 1096 [Open] a5f35a [Reply] report
- the 'review threads' debug UI now has two new tabs for the job schedulers. I will be working with UI-lag-experiencing users in future to see where the biggest problems are here. I suspect part of it will overhead from downloader thread spam, which I have more plans for
- all jobs that threads schedule on main UI time are now profiled in 'callto' profile mode
- .
- site encoding fixes:
- fixed a problem with webpages that report an encoding for which there is no available decoder. This error is now caught properly, and if 'chardet' is available to provide a supported encoding, it now steps in fixes things automatically. for most users, this fixes japanese sites that report their encoding as "Windows-31J", which seems to be a synonym for Shift-JIS. the 'non-failing unicode decode' function here is also now better at not failing, ha ha, and it delivers richer error descriptions when all attempts to decode are non-successful
- fixed a problem detecting and decoding webpages with no specified encoding (which defaults to windows-1252 and/or ISO-8859-1 in some weird internet standards thing) using chardet
- if chardet is not available and all else fails, windows-1252 is now attempted as a last resort
- added chardet presence to help->about. requests needs it atm so you likely definitely have it, but I'll make it specific in requirements.txt and expand info about it in future
- .
- boring code cleanup:
- refactored the base file import job to its own file
- client import options are moved to a new submodule, and file, tag, and the future note import options are refactored to their own files
- wrote a new object to handle current import file status in a better way than the old 'toss a tuple around' method
- implemented this file import status across most of the import pipeline and cleaned up a heap of import status, hash, mime, and note handling. rarely do downloaders now inspect raw file import status directly--they just ask the import and status object what they think should happen next based on current file import options etc...
- a url file import's pre-import status urls are now tested main url first, file url second, then associable urls (previously it was pseudorandom)
- a file import's pre-import status hashes are now tested sha256 first if that is available (previously it was pseudorandom). this probably doesn't matter 99.998% of the time, but maybe hitting 'try again' on a watcher import that failed on a previous boot and also had a dodgy hash parser, it might
- misc pre-import status prediction logic cleanup, particularly when multiple urls disagree on status and 'exclude previously deleted' is _unchecked_
- when a hash gives a file pre-import status, the import note now records which hash type it was
- pulled the 'already in db but doesn't actually exist on disk' pre-import status check out of the db, fixing a long-time ugly file manager call and reducing db lock load significantly
- updated a host of hacky file import unit tests to less hacky versions with the new status object
- all scheduled jobs now print better information about themselves in debug code
next week

Next week is a 'medium size job' week. I would like to do some server work, particularly writing the 'null account' that will inherit all content ownership after a certain period, completely anonymising history and improving long-term privacy, and then see if I can whack away at some janitor workflow improvements.