E-book saga continues: HTML scraping
As you might imagine, I'm "somewhat" busy working on my IPv6 summit presentation. I wrote this rant a while ago but somehow never managed to publish it.
In a comment to my piracy rant Steve asked how I feel about Safari. In principle, I like anything that brings my books to the readers in a more usable form, and Safari is a perfect idea: virtual bookshelf, searchable books, and temporary access to books you don’t need permanently ... The implementation, however, belongs to the previous century; it’s too easy to write a bot that scrapes the text from HTML and eventually collects the whole book.
I was acutely reminded of this problem when an anonymous reader posted a link in a comment to one of my CCNA-level posts. It was very obvious the source of the stolen material was an HTML-based e-book split into pages based on section headings (and Safari could be one of the convenient sources). Doing a few quick Google searches, I was able to find numerous other Cisco Press books available in the same “convenient” format (not to mention that half of the first page hits for many Cisco Press book titles point to rapidshare.net and its siblings). All I can say is: it’s amazing (and I’m so glad) you’re still buying dead-tree-based books.
Therefore to me digital books are just as good as physical (analogue? :P) books
That being said, when dad (I) get home at the end of the day, eBooks just aren't the same reading to my kids cuddling around me.