- Endchan Magrathea

Anon
11/6/2024 04:54:00 No. 11285 [Open] [Reply]
thumbnail of a_normal_conversation-Larkin-20230707-youtube-1920x1080-Zoq4mu3XfiA.mp4
a_normal_conversation... mp4
(671.74 KB, 1920x1080)
Downloading the web! You will sometimes be disgusted by those websites which shove a Cuckflare captcha in the face of every single visitor. Using selenium is sometimes the solution. In certain case, it works when "everything else fails", such as wget, grab-site, lynx, and curl.

Selenium can open a web browser then save the source code of it. Selenium's automation software so it can do it as controlled by CLI. It works with Chrome Browser, Brave Browser, and probably others. Selenium or something like it is used by Wayback Machine and archive.today. We know it's used by WBM because you can sometimes see error messages which indicate that a browser was used to automatically dl the webpage/webfile. archive.today also sometimes has errored captures; those show a Chrome error page.

Archivists should know how to use it because it's definitely helpful in some cases. The following works as a .sh or a single command, doesn't work in cgi-bin/ because Apache limits what www-data can do - CGI script I wrote to make Selenium use Brave to download a page:
#!/bin/bash
echo "Content-type: text/plain";
echo;
url="$(echo -n "$REQUEST_URI" | sed "s/.*?url=//g")";
urlsafe=$(echo "$url" | sed "s/:\|\/\|?\|=\|@\|&\|(\|\
)\|,\|+\|*\|%\|#/-/g");
time=$(TZ=UTC date -u +%Y%m%d%H%M%S);
echo $url;
echo $urlsafe;
echo $time;
python3 -c "from selenium import webdriver; options = \
webdriver.ChromeOptions(); options.binary_location = '\
/usr/bin/brave-browser'; driver = webdriver.Chrome(opt\
ions=options); driver.get('$url'); print(driver.page_s\
ource)" > /zc/put/cunt/selenium/$time-$urlsafe;
echo "$url" > /zc/put/cunt/selenium/$time-$urlsafe.txt;video unrelated