/hydrus/ - Hydrus Network

Bug reports, feature requests, and other discussion for the hydrus network.


New Reply on thread #1603
X
Max 20 files0 B total
[New Reply]

[Index] [Catalog] [Banners] [Logs]
Posting mode: Reply [Return]


thumbnail of FA_Compatibility_v4.1
thumbnail of FA_Compatibility_v4.1
FA_Compati... 1
(829 B, 0x0)
Hey,

Since my last message, I recently had time to work with Hydrus, and I developed a semi-functional parser for my HTML files. However, I don't want to include the description string in the tags, so I need to remove it. However, I'm not sure how to do this. I assume that it has something to do with the "string selector/slicer" component, but I'm not sure how that component works. I managed to do an incredibly crude filter by regex filtering by ^[^<]+$, which blocks the < in html tags, but this will presumably fail for any description that doesn't include html tags, for whatever reason. I would really appreciate your advice on this. If you need to see my (bad) code, I have attached the code to copy as a text file.

Thanks!

Ps: I can't import sidecar parsers through pngs, as the file selection dialogue doesn't show png files, just folders. Is there something I'm doing wrong?
thumbnail of Screenshot 2024-02-24 132030.png
thumbnail of Screenshot 2024-02-24 132030.png
Screenshot 2024-02-24... png
(46.14 KB, 1454x987)
Yeah, this is tricky. I still think your best answer is to wait for me to implement a proper xml/html parser for sidecars, or do the parsing yourself in an external script (e.g. with python) and then convert your html into nicer .txt files that hydrus can suck up easier. You could also filter your undesired tags better there.

If the 'description' string is always in the same location, and that location is always the first or last index of your list of strings, then the 'string selector' might help. It does list slicing like in programming, if you are familiar, like "my_list[4:6]". If you aren't familiar with that, or the description is in the middle of the list here, you are correct in trying a string match, which is basically a filter.

If the description line has a classname in the html, you might be able to exclude it with a string match before you Split/Convert all the html garbage away.

I am not totally sure what you mean by being unable to import by pngs, but if I click 'import->from pngs' (pic related is from the 'sidecars' tab of the 'add tags/urls with the import' dialog after you drop some files on the client), I get a file dialog that allows me to import a png like this. Do you get different?

thumbnail of highdiskusage.png
thumbnail of highdiskusage.png
highdiskusage png
(202.89 KB, 1600x900)
thumbnail of pngparserbug.png
thumbnail of pngparserbug.png
pngparserbug png
(128.77 KB, 1600x900)
Hey,

Sorry that it took me a while to respond. I figured out a solution to my html parsing issue. Apparently, the parser first splits sidecars by the specified string, then passes the split strings onto the postprocessor, breaking the regex splitting I used in the process, before then passing it back to the parser, where it is then split by the string again and saved. I figured out a solution to this problem, but thanks for your help anyways. To answer your questions about the parser import, I am using Linux Mint 21.3 Cinnamon with qt6.5.2, rather than windows, and the pngs do not appear in the window. Maybe the apis are different? I attached a screenshot of this, if that helps.

Thanks for your help!

Ps: why is this client writing TBs of data to my ssd while processing the PTR? I understand if it writes a lot, but multiple TBs (according to some math I did, I haven't gotten that far yet, just ~750gb) still seems a bit excessive. I attached a screenshot of this too.



 >>/1610/
 >>/1611/
 >>/1612/
Yeah, the write is most likely temp storage. I've heard that in some situations, the actual amount written to disk is less since some temp-file magic means some of the shortest-lived data is purged from the write cache before it can actually be committed, but I don't have good numbers on the topic.

The PTR does a lot of database work. The no_db_temp_files parameter will reduce it by a good amount. Not sure how much, but it would reduce it.

Increasing db_transaction_commit_period may also reduce it, I think, since it will reduce the commit frequency (at the cost of increasing per-transaction size, which will strain your temp folder and slow processing time).

A user also wrote this document with a lot of related info, if it helps: https://hydrusnetwork.github.io/hydrus/Fixing&#95;Hydrus&#95;Random&#95;Crashes&#95;Under&#95;Linux.html

I'm not a Linux expert myself, so I can't talk super confidently here. Let me know how you get on!


Post(s) action:


Moderation Help
Scope:
Duration: Days

Ban Type:


5 replies | 5 file
New Reply on thread #1603
Max 20 files0 B total