Saturday, November 18, 2006

A Fit on Fair and Balanced

(Tonight's image borrowed, with respect, from Paul's Justice Blog).

In case it's not been crystal clear in the times I've blogged about her, I adore my wife...and I'm very grateful we've both decided to revise, re-create, and re-release our relationship (e.g. Marriage 2.0)

There are nights we sit on either side of the dinner table, dueling laptops, keyboards, and mouse clicks. Generally, she's working on her latest poem or reading other people's blogs. I'm not yet that disciplined. Sure, I write my posts, but you've probably noticed that I'm not yet in the habit of writing daily. Some nights, I work on my other blogging responsibilities, such as reading the sites of people who may be turning into friends, leaving comments, and trying to create a wee bit humor in my wake.

Oh, I know, it's bad form to wait for inspiration, but sometimes, it's worth the wait. (For example, my recent attempt at handling a Poetry Thursday challenge.) And sometimes, the post is inspired by our conversation.

For example, JP is (currently) working on a poem using a form called a tanka, one that uses a specific measure of syllables.

While I like being near her when she's working for many reasons, one of the most rewarding is when she asks me a question about grammar or pronunciation. While I have written computer programs for much of my professional career, I also have a wide and deep knowledge of the English language.

As a result, I have a unique blend of creative and technical interests. One such interest surfaced during a discussion that JP and I had tonight. There is a site she uses to verify syllable count for words that can be pronounced differently in different parts of this country. For example, the word "prayer" is pronounced with one syllable on the West Coast. If you listen to people from various places in the so-called American "Deep South" pronounce the word, you will hear two distinct syllables.

A few years ago, I was asked to design a spell-check component, in part because my employer at the time was too cheap to pay for an add-in (it was pretty expensive). For a different part of the same project, I was also trying to design a way of locating names phonetically, so that typing "JOHN" would lead you to Johnson, Johnssen, and other common variations. So I spent some time investigating the composition of words and the patterns of pronunciation.

During my investigation, I learned about the most common algorithm used by computer programs to count syllables, which is to count the number of vowel groups in a word. For example, the word "vowel" has two vowel groups (O and E); according to this algorithm, the word has two syllables.

It's a pretty easy algorithm to write in any computer language. Unfortunately, it's also very flawed. Consider, for example, the word "neon." According to the algorithm, it should have one syllable, but in "normal" pronunciation, there are two syllables. The most-commonly used algorithm "breaks" (or is wrong) when it encounters words that break syllables between vowels. With one set of words claiming to represent a decent subset of the language, the algorithm was wrong 25% of the time, that is, it was wrong one in every four words.

(I actually did manage to design a far more accurate solution for my project, but that's not the point of this post.)

I mention this because the first word I typed into the site JP pointed me to was, of course, "neon." Unfortunately, it reported the word as having one syllable. I punched in a few of the other words I remembered from my earlier investigation and it failed on each of them. It seems pretty clear that whoever implemented the code underlying the site started thier investigation pretty much the same way I did. They found the same sources and ran into the same algorithm. However, unlike me, they trusted the information they found and implemented their site with it.

I wonder how many writers, school-age children, and other users will innocently presume that this site is authoritative and not even realize they may be getting invalid results from it. I wonder what they'll think when the projects they work on while using this site do not receive the acclaim (or marks) these users might think they deserve. Will they realize they used flawed tools? Or will they blame those trying to correct their mistakes? (Yes, these questions are at least slightly rhetorical.)

As writers (and computer users), we rely on a number of tools to accomplish our tasks. As the Internet continues to evolve and democratize the production and dissemination of information, new resources and tools appear daily. Things get easier. Word of mouth celebrates new tools and kits.

However, mistakes also become easier to make. And such mistakes become larger as they ripple outward, like a virus innocently transmitted.

I'm sure my darling JP would never create a tanka that relied on using the word "neon" as a single syllable word. However, I could see someone with less knowledge of this language making that mistake. To be honest, many media outlets appear to have become a little too "loose and goosey" with regard to their fact checking. (Remember the story that hastened the end of Dan Rather's career?) It would be terribly unfortunate have one's faith in a source, or a resource, lead to a story (or a post) with, um, inaccurate conclusions based on what used to be described as "flawed data". Embarassing, really.

In traditional journalism classes, you're taught to verify your information from at least two completely unrelated sources. Make sure you're being at least as diligent. Do this in your professional work...and in your personal relationships. There are many times that, as people, we react without fully understanding "the other side of the story."

Be diligent in all facets of your life. Please.

5 Comments:

Blogger JP (mom) said...

Hello, Papa Tango Charly!! How fun it is to see our nightly conversations, thoughts, etc show up here... tu est tres magnifique est je t'aime beaucoup. x...x, JP

12:13 AM  
Blogger paris parfait said...

Excellent points - and as someone who grew up in the South but has spent most of her adult life elsewhere I appreciate how people pronounce words differently, with one syllable or more. One of my friends who works for HP in Holland says that computers and programs are now at the place where cars were when they were first invented - lots of bugs, fragile and easy to crash. But there's hope as technical improvements are made and programs are created to more accurately mirror the real world behaviour. Great post!

1:30 PM  
Blogger Mcglk said...

I'm curious as to what algorithm you eventually settled on.

3:10 PM  
Blogger Mcglk said...

One other thing.

http://mcglk.blogspot.com/

3:12 PM  
Blogger Doe said...

Without doubt waiting for
inspiration to come is best,but you know what? I find that the more one blogs, the better one gets at it! Even if it’s just crazy incoherent
thoughts.. Like the ones that make up mine for example? anyway thanks for stoppin' by : )

5:56 AM  

Post a Comment

<< Home