Month: August 2013

  • Fixing ebooks with errors – A personal challenge

    I have wholly embraced the eBook revolution. As a long time traveler, and SciFi aficionado, I have assembled a large collection of books that I continue to read to mark them off my to do list.

    Being a fan of science fiction, I have been forced to acquire some of my books by extra-legal means. Since many of the classic tomes of the golden era of SciFi are out of print, and have no official ebook release to buy, I turn to the internet.

    With few exceptions, these books are scanned and OCR’d from print, and then stuffed into a file to read. Lots of early Heinlien, and obscure authors exist only this way.

    The problem, OCR still sucks.  Even the best algorithms barf a lot on text and thus there are spots of garbage in many of these books.

    I sometimes make it a personal mitzvah to clean up a book.

    Classic example was the “To the Stars” trilogy, by Harry Harrison (his real name, not a nom de plume). It was a rather poor scan and conversion to an RTF file. It was a painful process to fix, but totally worth it, because it made the book completely readable.

    However, if your book is in ePub of PDF format, you have fewer options.

    Sigil, a pretty awesome open source ePub editor
    Sigil, a pretty awesome open source ePub editor

    The program I go to is Sigil. Provided there is no DRM, you can open and inspect the book, and fix small things. If you are savvy, you can also dive into the CSS stylesheet and alter fonts, indents, and other text properties (but be warned, some readers ignore much of the CSS codes and classes – I’m looking at you Sony Reader).

    Sigil allows you to look at the text as it renders, at a split screen with the code below the rendered text, or just pure code. You can fix a lot of errors and glitches with the search and edit the code, saving back to the original file.

    A future series of posts will go into depth on how to better structure the ebook.

    Another good program, and one that is widely used Calibre. A library, and file manipulation program, it is open source and extensible. It makes it easy to convert from one format to another (Kindle to ePub, or LRM to ePub, and many other options.)

    A nice touch is that in Calibre you can better setup the ISBN, the cover images, and get data on the book from public databases. I used Calibre to convert a collection of Doc Savage stories from the lrm format (the original Sony Reader format) to ePub, and to add good cover pictures.

    In fact, most of the ebook files I look at in Sigil have signs of being converted/cleaned by Calibre, even some commercial books.

    Doing this work, you find some things like:

    • Files which came from Microsoft Word – littered with the “class=msonormal” tag. Ugh. I don’t usually curse too much about microsoft office, but what it outputs for HTML that is converted into an ebook is a crime against humanity.
    • Most ebooks, even commercial, professionally edited and assembled ones, have horrible structure. Not proper links to the chapters, nor proper tables of contents. Commercial books are much more likely to get this right, but it is a disaster on the community sourced works. I am working up a process to fix that.
    • There are some truly shitty OCR engines out there. Even high priced, high performance engines have trouble, the second tier is atrocious. Someone once grumbled on Slashdot why there weren’t any good (free) open source OCR engines, and the answer is that because it is friggin hard, and it often becomes a lifetime’s work to tune and improve the algorithm, so the good ones are not in a hurry to be given away.

    I rarely make a mission to fix an ebook, but when I do, I want to leave something that is a better experience to read.

    (For the record, if there is a place to buy a book, I will always buy it, but much of what I read is esoteric, or out of print, so I am forced into alternatives. )

  • A kudos to Courtyard

    As a road warrior, I spend a lot of time on the road, sleeping in hotel rooms. I have learned to deal with loud ice machines, obnoxious families with kids tearing up and down the hallways at all hours, lumpy beds, and loud air conditioners. Doesn’t matter if it is a $90 La Quinta, or a $300 Hilton room, they all have warts.

    Courtyard by MarriottUsually the one thing that grinds my gears is the  bed in the room. Usually it is either stiff as a board, or completely worn out. Regardless that leads to a poor night of sleep, and a lot of discomfort (that getting older makes much worse) due to bad posture in bed.

    However, I have to congratulate the Marriott Courtyard in Campbell, CA. The bed is perfect in my room. Supportive, comfortable, and I have had two good nights’ sleep in a row, a real rarity!

    I will stop complaining about the slow as molasses in January elevators in Courtyards if you can make sure that all the beds are this perfect.

  • The good and the bad

    I am back in the south bay for my high school reunion, and I visited two places that I have always loved.

    The good: Guitar Showcase

    The place to go to find the finest new and used stringed instruments for the discerning player
    The place to go to find the finest new and used stringed instruments for the discerning player

    Way back when I first started playing guitar, I was introduced to the legendary Guitar Showcase in Campbell. It is an iconic music store, and naturally has a wide selection of guitars (as well as other instruments).

    They have greatly expanded the store, and added a lot of floor space. They still have their vintage room (they also put their jazz boxes there, and the PRS guitars, probably to keep us plebe’s from drooling on them) that is fun to browse. I know they will let you play many of them, if you ask, but I have never had the courage to ask.

    One thing that has changed is that they have moved the acoustic guitars from the back room to upstairs. And they have a lot more. I played a cherry Taylor nylon string (felt a little strange, a steel string neck, with a radius and all), a genuinely awesome sounding 814ce (I have a 1996 vintage Taylor 814C (no electrics) that is pretty sweet), a dobro resonator, and a few of the spanish made classical guitars (I also played a mid range Yamaha classical guitar that was pretty sweet for $400).

    Downstairs, I was looking hard at the Gibsons. They had a few cherry SG’s, the classic, light weight, ultra fast neck, and just gorgeous.  They also had a pretty good selection of Les Paul’s. One that caught my eye was a $2200 Gary Moore signature series. I love the simple finish, and the feel of that guitar. They also had a very cherry Les Paul standard custom for $1600 that was ultra nice. Very light wear, and a truly sweet guitar.

    I did manage to walk out without dinging my credit card, but I have to admit it was hard.

    The bad: Fry’s Electronics

    Being a geek, growing up and living in the bay area, my first stop for tech product was always Frys Electronics. From the original Sunnyvale location on Lawrence Expressway (now a few blocks north of Lawrence) to the other Fry’s, they were always clean, well stocked, and helpful (even if their workers weren’t the brightest bulbs). I bought many a stick of Ram, CPU, motherboard, or power supply there.

    SInce our hotel is only a couple blocks from the Hamilton Avenue Fry’s, we swung by this afternoon. I was horribly disappointed. Lots of empty shelves. Product that is poorly stocked, and outside the TV area, it just looked ratty.

    I guess I can understand that they are no longer the preferred vendor for tech odds and ends. I suspect Amazon and other online resellers are matching and beating their prices, but I was shocked at how ratty Fry’s had become. An icon from my youth/young adulthood is in decline.

    Ah well, 50% isn’t a bad batting average…

  • What I am reading – Catcher in the Rye

    I had read it a long time ago.  I think I bought a used copy at one of my trips to Powells in Portland, but a recent re-run of a South Park episode, “Scrotty McBoogerballs” caused me to pick it up again.

    Certainly one of the best works of the 20th century.
    Certainly one of the best works of the 20th century.

    I am talking of course about Catcher in the Rye by J. D. Salinger. A story told from the eyes of the main protagonist, it is alive with the references and language that a troubled teen would use. When I was in high school, the words were different, and I didn’t go to a boys only prep school, but we used the lingua franca of the times in our daily conversation, much as Holden does in Catcher.

    I did not read it as a teen, because at that time it was prohibited from the library. But had I, I would have identified well with Holden. Perhaps not as brusk or abrasive, but, like many, High School was a tumultuous time for me.

    The premise of the South Park episode is that the boys are assigned the book, and told it used to be banned.  Thinking that it was filled with foul language, sexual innuendo and other titillating tidbits, the boys are disappointed in how tame the book was.

    Likewise, my original thoughts on reading a “banned” book, the first time I read it I was looking for the causes of that banishment, but failed to “get” the whole point of the story.  This time through, I am reading it carefully, enjoying and savoring the experiences of Holden Caulfield, and his recounting of his experiences. It is both entertaining, and thought provoking.

    If you haven’t read it, or read it a long time ago, I highly recommend picking up this classic, and re-reading it, perhaps a few times, to truly grok its fullness.

  • Facebook “Promote” post – Don’t waste your money

    The dog who is inspiring the "Jackie's Fund"
    The dog who is inspiring the “Jackie’s Fund”

    I manage a couple of facebook pages, one for my employer, and one for a non-profit that I work with (Southern Arizona Greyhound Adoption).

    I am constantly barraged by Facebook to “promote” a post to extend its reach, and get more responses. We recently took in a hound who had an injury on the racetrack, and we are doing some fundraising to cover the not insignificant vet costs.

    Our marketing team wanted to try to promote this post to see what the result would be.

    So, I signed up for it, gave them my Paypal information, and put a $15 limit.  They said that the post would reach between 3,000 and 4,000 people.

    4,022 people saw this post over the next 24 hours on their wall. Of this 87 people clicked on the post, or 2.1%. 7 people “liked” our page. Our website had 30 extra visitors that day (almost indetectable in the long trend), and not one additional donation.

    I posted it on my timeline, with a plea for my friends to cough up a buck or tow if they could afford it. That generated the ONLY donation that we have received that wasn’t in our existing network.

    My conclusion is:

    • Facebook promotions are worthless. They game the propagation to try to encourage you to pay money to promote, but when you do, you get no results.
    • You are better off using your own network, and encouraging them to share than to use the facebook tools.
    • Facebook is a lousy vehicle for promotion. You may need to have a presence, but it just doesn’t translate into action from their billion + users.

    Oh, and if you want to help us out, and do something for the awesome greyhounds, head over to this link and click the “donate” button.

  • I use Ad blockers, but I am not a dick about it

    I have long been a religious user of ad blocking software.  Since the first plugin for Firefox back in the day, and now I use adblock across the board (chrome, firefox, and safari).

    I particularly hate ads on sites that I pay for (NY Times, I am looking at you), or where my information is the principal value to the company behind that site (Google and Facebook fall into this category).  But occasionally, I run into a site that politely asks me to not block their ads.

    When I do, 99 times out of 100, I add that site to my exclude list. Today that was http://phys.org, a physics news site that I visit occasionally. They had a message bar to alert me to my use of an ad blocker (which I just don’t think about).  When I find this unobtrusive reminder, I add their domain to my exclude list, and deal with the ads. They are almost always just a few banner ads, and nothing truly annoying.

    I did try using noscript and ghostery, but that pretty much destroyed the joy of web browsing (almost as much as my experimentation with TOR).

    Of course, occasionally, I browse with IE and I am inundated with ads, so I am never ever going to go adblock free.

  • Music Reflections

    When I first started making money, my goals were to be able to buy LP’s of music I liked. I put together stereo from old components, an old heathkit amp and tuner, some hand built (not great) speakers, and I splurged on a decent Technics turntable.

    What music did I buy?  Well, bands like Kiss and Cheap Trick were all the rage, but I had more eclectic tastes, and veered more towards what I called “art rock” (now widely called “Progressive Rock” or Prog Rock). Bands like ELP, Yes, King Crimson, were all in my early discs. I remember a DJ on KSJO who played all this awesome music that got me into a lot of great tunes. Greg Stone was his name, and from him I got into Camel, Gary Moore, Alan Holdsworth, Jeff Beck (and the yardbirds), The Moody Blues, and more.

    I started playing guitar, and then my tastes ran toward harder rock. Led Zeppelin, UFO, Michael Schenker Group, Scorpions, Y&T, Ratt, Steeler, Yngwie Malmsteen all were on my daily playlist. And who could forget Rush, especially the early work (through moving pictures)? Also lots of what is called “classic rock”, pink floyd, AC/DC, Kansas, etc is in my collection now.

    As time went on, I kept adding from these genre’s, not really straying far. As some friends veered into country, or modern rock from the 80’s, I stayed true to my roots. I got more into the Blues, listening to Robert Johnson, Johnny Winter, and a wide swath of Eric Clapton.

    Today, I still listen to most of the same genre. There are some new additions, Porcupine Tree, Special Machines, Marillion, collective soul, Blues Saraceno, Joe Satriani, Steve Morse, Frank Marino and Mahogany rush. So much awesome music, so little time.

    I was thinking about this as my 30th high school reunion is next weekend. I realized that I still listen to much of the same music as I did then, and really haven’t shifted into different and unfamiliar genres. I guess that the habits learned then will stick with me forever.

    The more things change, the more they stay the same…

  • Modest Ambitions

    The foot injury is getting better, but alas it is still pretty painful. I am pretty sure that it is a pulled or torn ligament in the top (the one for the big toe).

    I am cutting back on the vitamin I, and still resting and icing it. But I know that I am going to be going easy on it for the foreseeable future. Thus, no exercise, and that makes for a sad panda.

    My goals for the next two weeks are going to be to get this healed, and to not destroy my diet. Of course, with the trip next week and the reunion I am probably going to blow my calorie budget. So much for holding it constant with exercise.

    C’est la vie.