thumbnail of 1426109984058.png
thumbnail of 1426109984058.png
1426109984058 png
(167.39 KB, 1024x1122)
 >>/8979/
Done
Bash implementation for: 4chan rendered thread webpage ctrl+a,ctrl+c,ctrl+v -> TXT file -> extractor -> text files per-post
[1]

Done, mostly
Bash implementation for: Desuarchive thread webpage source code -> HTML file -> extractor -> HTMLs files per-post. Proof/example (remote pin=w3s):
https://dweb.link/ipfs/bafybeifiub4ovlskqwosrrnj6r7ofwgbzhfi34kv2pso3j2wrfqtczdrum/desuarchive/mlp/thread/20278564/
[2]

Both methods can take like one minute or more to proccess a thread.

[1] see https://ipfs.filebase.io/ipfs/Qmb7pn6qDfb75QZx65W26JPiffZe5rGjJgpCC4R4hV6F4Y/4chan/mlp/thread/40219665/multiple_how.txt
[2] via
> $ curl -sL https://desuarchive.org/mlp/thread/20278564/ > 20278564.htm
> $ cat 20278564.htm | perl -pE "s/\n//g" | tail -n +191 | head -n199919991999 | head -n -203 | sed "s/ //g" | xxd -p | tr -d \\n | sed "s/../&/g" | perl -pE "s0a/\n/g" | xargs -d "\n" sh -c 'for args do id=$(echo $args | sed "s/.*22%20%69%64%3d%22//g" | sed "s22.*//g" | sed "s///g" | xxd -p -r -); echo $args | sed "s//g" | xxd -p -r - > $id.html; done' _; rm .html
> $ # this partly helped: https://code.whatever.social/questions/296536/how-to-urlencode-data-for-curl-command # 337 posts incl. OP & ls | wc -l = 339 & diff. of 2 = how file and complete thread file
BTW, in that thread, https://ipfs.io/ipfs/bafybeifiub4ovlskqwosrrnj6r7ofwgbzhfi34kv2pso3j2wrfqtczdrum/desuarchive/mlp/thread/20278564/20334645.html (picrel) says ">Berry Punch is Cheerilee's sister \n I love you meta"; made me think: viewers did kinda see Berry Punch's sister in the Sisterhooves Social, she wasn't Cheerilee.