/pdfs/ - Burn the books, save files now!

Endchan's ebook board


New Reply on thread #94
X
Max 20 files0 B total
[New Reply]

[Index] [Catalog] [Banners] [Logs]
Posting mode: Reply [Return]


thumbnail of tumblr_mno72hcWcY1ste7qoo1_1280.jpg
thumbnail of tumblr_mno72hcWcY1ste7qoo1_1280.jpg
tumblr_mno72hcWcY1ste... jpg
(93.21 KB, 1024x764)
If you're scanning properly the book is destroyed in the process, sadly. I think with certain glues you can put it in the microwave to get the pages out. Obviously don't destroy actually worthwhile books.

 >>/94/

My offhand attempt at a meta-FAQ, copied largely from unknown authors elsewhere ;) 

(I) Destructive scanning - generally necessary only if it is a tight binding and you don't have a book edge scanner or an overhead scanner (DIY or manufactured).

(II) Non-destructive scanning - most books can be done this way, even with conventional (not book edge) flatbed scanners.

*If you don't read anything else, please bear these uppermost in mind: 

A) don't scan-stoopid, as in scanning a book with b&w text in TruColor and 72dpi or for that matter anything less than 300dpi (or in general more than 400dpi or max. 600dpi). And don't scan a normal b&w text in color because the resultant images size will be prohibitive. Scan b&w text in grayscale which you can then process in the likes of ScanTailor and even turn into b&w but nicer than a scanner's b&w.

B) unless you are an expert with limitless time available, don't scan to OCR only.  In other words set images (tiff, png etc) as your scanner's output and bind these into pdf or djvu, preferably after tidying up. If you know how, you can add hidden searchable text layer. If you don't, or you want flowable text as in epub etc, then if you are a novice you will likely mess up so do the page images route as outlined above and some nice person out there who knows how, has time to spare and thinks it worth it, might just do the conversion...

What follows imo gives the basics, and this should help you NOT mess up. Bear in mind that for more detailed info if you really need it, specialist webpages exist as in DIY bookscanner, Scanning FAQ etc etc. *

[continues...]

 >>/169/
[...continuing]

Scanning as open book / landscape mode doesn't make it easy to read on tablets and other mobile devices. How many people read a whole book sitting at a PC?

Basic suggestions - getting it (broadly) right involves time and effort:

1. scan routinely at 300dpi, anything less for text than that or absolute minimum of 200dpi (and then only rarely) will make OCR layer fail or be unreliable.

2. scan text in grayscale (or for smaller output file AND if the original book is well printed, in black and white). Reserve color for color originals like covers, illustrations and graphics.

3. if the original book has thin pages that show the text on the reverse side when scanned, insert a black piece of card behind/under the page being scanned. Scanning takes longer but this will render the result far superior.

4. get an edge scanner, often cheaply available on ebay etc, so you can scan near the center binding margin without missing text or damaging the book.

5. only use more than 300dpi if the text is very small and in no event use more than 600dpi as it can confuse OCR.

6. always examine a test page's scan before proceeding. A faded or discolored page can often scan well if the scanner software's own gamma/midrange, contrast and dark/lightness controls are adjusted to suit.

7. to retain ability to clean up scanned images before assembling in pdf or djvu consider scanning to individual files as (uncompressed or packbits or lzh) tif format, or png. After scanning you can then clean up the images to center, align and remove black shade etc in programs like ScanTailor. Check out sites on diy scanners and software.

8. always assemble as a pdf or djvu with the (cleaned up) images in it with a searchable text later underneath them. Adobe Acrobat Professional does this but you can find free (but slightly inferior, OCR-wise) alternatives for MS Windows etc.

9. don't assemble pdf or other output e.g. .doc as OCR only i.e. recognised text without the original page images. OCR and page formatting is still very hit and miss and readers appreciate the look of the original item which is likely to be more memorable than generic text which doesn't retain original fonts or layout.
Also check that all original pages are present and in order and in the correct orientation.

10. use your pdf creating program to optimize the final  pdf so its size is reasonable without noticeably compromising image quality.

11. don't get too hung up on the process and suffer workflow block by fretting too much over scanned page image quality. Once you've scanned a few you will easily  churn them out to a high standard without wasting time on too much finessing.


Post(s) action:


Moderation Help
Scope:
Duration: Days

Ban Type:


3 replies | 1 file
New Reply on thread #94
Max 20 files0 B total