- Endchan Magrathea

Anon
1/1/2024 22:33:00 No. 9195 [Open] [Reply]
breakpoint.d9ed8ac2 jpg
(63.29 KB, 800x450)
thumbnail of Twitchyy_Fiddles_Fillies_Live_Filly_Astray-Twitchyy_Live-20190624-youtube-1920x1080-gOdTUxQbpfo.png
Twitchyy_Fiddles_Fill... png
(621.14 KB, 1280x720)
 >>/9190/
> Will the Circle be Unbroken PMV
I'm downloading that small channel: Twitchyy Live UCjVuMgZFWcrlKBkRCPHPnMw

 >>/9194/
I'm guessing there's a better way to extract a web file or webpage from .warc.gz. Hacky method:
> $ zcat QmdhyCk6v3W7sgXyXEVcUdas4cgVbvxLjKs5tZHbry4fAj?filename=SizeOfThisFileIs478MB-2023-09-30-eed417c7-00000.warc.gz | grep -ai -A 74811 "WARC-Target-URI: https://solana.com/&#95;next/static/media/breakpoint.d9ed8ac2.jpg" > dat.bin
> $ vim dat.bin # go to the last match for "\/_next\/static\/media\/breakpoint.d9ed8ac2.jpg" because the JPG data of the first match isn't it (it's like 120 KB)
> [...]WARC/1.0 \ WARC-Type: response \ Content-Type: application/http;msgtype=response
> WARC-Date: 2023-09-30T23:52:01Z
> WARC-Record-ID: 
> WARC-Target-URI: https://solana.com/&#95;next/static/media/breakpoint.d9ed8ac2.jpg
> WARC-IP-Address: 76.76.21.21
> WARC-Concurrent-To: 
> WARC-Block-Digest: sha1:4QZNA6CMPMVJOCXHZPKK6JYJIHD3MQWW
> WARC-Payload-Digest: sha1:TN6JD3AMSTOXXC6DVZU3TEQ2XBFA3YCB
> Content-Length: 65360
> WARC-Warcinfo-ID: 
> 
> HTTP/1.1 200 OK
> [...]Content-Disposition: inline; filename="breakpoint.d9ed8ac2.jpg"
> Content-Length: 64811 \ Content-Type: image/jpeg
> Date: Sat, 30 Sep 2023 23:52:01 GMT \ [...]Server: Vercel
> [...]X-Matched-Path: /_next/static/media/breakpoint.d9ed8ac2.jpg
> X-Vercel-Cache: HIT \ X-Vercel-Id: sfo1::xrth7-1696117921088-0dc003128811
> [...binary data for JPG file starting on line 103...]
> [use vim to delete everything that isn't that JPG data then save to "d.bin.1.jpg"]
> $ xxd -c 999 -ps d.bin.1.jpg | sed "s/0d0a$//g" | tr -d \\n | xxd -ps -r - > breakpoint.d9ed8ac2.jpg
> $ sha1sum breakpoint.d9ed8ac2.jpg
> 9b7c91ec0c94dd7b8bc3ae69b9921ab84a0de041 [...]

What's the encoding for "WARC-Payload-Digest...sha1"? Doesn't look like Base64. Also with webpages in .warc.gz, I've seen split markers every 80,000 or some number of lines. Some webpages are just one very long line, like one million plain text bytes all on one line; in WARCs that one-million-byte line would be split into multiple ~80K lines, and it would still be replay-able or get-able as the original one-million-byte line webpage. 2nd image from
> ./youtube/Twitchyy_Live_UCjVuMgZFWcrlKBkRCPHPnMw/Twitchyy_Fiddles_Fillies_Live_Filly_Astray-Twitchyy_Live-20190624-youtube-1920x1080-gOdTUxQbpfo.png
Maybe it's that one game that one guy in /pag/ was asking about.