- multiline parsing:
- the parser now supports limited multiline parsing. the main changes are hardcoded: the formulae beneath note content parsers and those that do subsidiary page parser splitting no longer remove newlines when they parse. all the parsing UI and the test panels and so on are now aware of this and set flags in all the right places, and parsed notes are now washed through the new trimming/cleaning method, and everything _seems_ to basically work. the main remaining problems is the complicated string processing UI has mixed single/multi-line testing support. some looks great, most gets coerced to single-line just for the previewed test results
- as an example, the default hentai foundry downloader now grabs the artist description as a multi-line note
- the parsing sub-system that extracts cohesive strings from complex html blocks now inserts newlines at 'p' and 'br' tags
- trying to parse clean multiline notes still caused several formatting issues this week, so I have updated the automatic note-washing routine to standardise hydrus notes in several new ways that I hope will not be too disruptive to manually written notes:
- the note washing routine now coerces all newline characters to 'backslash-n', regardless of platform
- the note washing routine now trims each line, so no leading or trailing whitespace anywhere. I am open to changing this in future, maybe for handwritten notes where you really want an indent somewhere, but parsing from complex nested html tags is making a heap of weird extra whitespace, for which this is a clean solution
- the note washing routine now trims newline gaps that are greater than two-newlines. you can split paragraphs by one empty line, but no more
- there may be other issues figuring out cleanly formatted strings from nested html tags--so give it a go and let me know what you think. maybe p and br blocks should always make two newlines, so we have separated paragraphs, maybe I need to parse more blocks, like h1 and friends. any specific example html blocks would also be helpful
- .
- cleanup:
- refactored ClientGUIParsing to its own 'parsing' module and split everything into four less tangled files
- cleaned up a bunch of taglist text presentation code, mostly simplicity and clarity in prep for future updates
- updated the checker options button to use a Qt signal instead of a callable
next week
I have more small work like this to catch up on, including github issues.