- Endchan Magrathea

Anon
12/15/2023 20:53:00 No. 9053 [Open] [Reply]
I've been running this type of workflow for the past months, on and off. Was fixing for corruption (I save blocks to two HDDs and the non-ZFS one got a bit messed up):
> $ # If you have part of a CID in one repo/HDD and want all of it and have a complete copy in another repo, then run this:
> $ cid=QmQh8RwLvQv91b8rLrmbL4zJE4v5Rg9gPU64R9EUdopunW
> $ has_all=/z2/b/ipfs/.ipfs
> $ has_part=/mnt/n/b/ipfs/.ipfs
> $ ipfs pin add --progress $cid 2> >(tee b.txt >&2); h=$(cat b.txt | sed "s/.* //g"); echo $h; IPFS_PATH=$has_all; ipfs dag export $h > $h.car; IPFS_PATH=$has_part; ipfs dag import --stats --pin-roots=false $h.car; rm $h.car
> $ # repeat previous command 14 times
> $ !!; !!; !!; !!; !!; !!; !!; !!; !!; !!; !!; !!; !!; !!; !!
Elegant method didn't work for some reason. Oh, it's because it uses sh and not bash:
> $ seq 1000 | xargs -d "\n" sh -c 'for args do ipfs pin add --progress QmQh8RwLvQv91b8rLrmbL4zJE4v5Rg9gPU64R9EUdopunW 2> >(tee b.txt >&2); h=$(cat b.txt | sed "s/.* //g"); echo $h; IPFS_PATH=/z2/b/ipfs/.ipfs; ipfs dag export $h > $h.car; IPFS_PATH=/mnt/n/b/ipfs/.ipfs; ipfs dag import --stats --pin-roots=false $h.car; rm $h.car; done' _
> _: 1: Syntax error: redirection unexpected

 >>/9051/
> [crawls of wikis] were often very problematical and poorly constructed no mater what I did.
Just use grab-site. It writes WARCs which capture the actual native format of web/http content. grab-site works in GNU/Linux and maybe also Windows 10. I use it:
https://github.com/ArchiveTeam/grab-site
> The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
> [history:] grab-site is made possible only because of wpull, written by Christopher Foo who spent a year making something much better than wget. ArchiveTeam's most pressing issue with wget at the time was that it kept the entire URL queue in memory instead of on disk. wpull has many other advantages over wget, including better link extraction and Python hooks.