Jason Brome

London Calling

Posted on September 8, 2003 by jason

For those of you who haven’t heard, the guys behind Kazaa have recently launched Skype, a new Internet telephony service. What’s the difference between this and the previous Voice chat and Internet telephony offerings? The Skype team, leveraging the knowledge they have in P2P software, seem to have created a solution that finally alleviates the user from the burden of worrying about Firewalls, NAT and the headache of their configuration.

Here’s the basic premise: you install the program, and it works. Now, I was a little sceptical when I first heard about this claim. So, I decided to give it a try. While I was at home, I decided to establish a call with a colleague at work who had installed the software. My home machine is behind a Linksys box, and the work machines are behind a locked-down firewall. Previous efforts to get some form of IP-based voice communication working have been fruitless; typically they required us to punch swiss-cheese like holes in our firewall. However, with Skype, I looked up my colleague in the directory, clicked on the dial button, and moments later we were in the middle of a voice conversation. The quality is not perfect – yet – however the majority of the time we were talking in crystal clear (better-than-telephone quality) audio.

I hope this marks the beginning of the next wave of user-friendly IP-based telephony solutions. There’s been a reasonable amount of activity in the Voice-over-IP arena over the past couple of years, especially with the growth of SIP and services like Free World Dialup and Vonage, the launch of SIPPhone, and the ongoing efforts of projects like Vovida, Asterisk, and Bayonne. Skype has a temporary problem in that it does not interoperate with any of these VoIP solutions, however they do plan on supporting both SIP and Plain Old Telephony System calling at a later date.

Ultimately these solutions give the consumer more choice over how to make voice calls. Applications like Skype push the barrier in terms of call quality and price (did I mention it is currently free?) Assuming the person on the other end of the phone has a reasonably good Internet connection, why would I want to pay 5-10c or more a minute to the phone company to route the call?

Posted in Technology | Comments Off

A Change in Persistence

Posted on September 6, 2003 by jason

Today I have finally come to the conclusion that nntp//rss, as of v0.4, will no longer use hsqldb as its embedded database. It has been a bit of a rocky road over the past few months, with memory consumption issues, and problems of reliability, especially after system failure. I’ve spent some time evaluating alternatives, and have decided that Mckoi will may be (see update below) the replacement.

I’m currently in the process of finishing up a migration process to upgrade v0.3 users’ hsqldb databases to Mckoi, and will test to see how the whole thing behaves on my 100K+ item database.

Note that this is in addition to the MySQL support that has also been implemented. The demo server has now been running on MySQL for about a month, with good results. This will play a key role in the more group-oriented features to come in later releases of nntp//rss. Which, by that time, may deserve a rethink on the name…

Update: Stop the press… somebody referred me to Axion, a “small, fast, open source relational database system (RDBMS) supporting SQL and JDBC written in and for the Java programming language”. Will be checking this out as well.

Update 2: Looks like my 100K record test is proving an interesting challenge. The first the part of the challenge being getting the data out of hsqldb. As hsqldb – to the best of my knowledge – does not implement any disk scratch strategies for queries, it creates the entire result set within memory. I’m now performing iterative queries across the items table in my nntp//rss database to extract the content. This is turning out to be a long process. Anyway, once I’ve got the data out of hsqldb and into Mckoi, Axion et al, I’m focused initially on the following areas:

Start up speed, given a non-trivial database
Update/Insert speed
Querying across a large recordset (on indexed and non-indexed columns)
Shutdown speed

Watch this space for further updates!

Update 3: Managed to port 100K database to Mckoi, but getting some strange exceptions. Still investigating. Also looking at the potential of other persistence mechanisms, e.g. JDBM and SMYLE. Axion looks interesting, but has a more limited range of features (no multiple aggregate functions within a select, no GROUP BY).

Posted in nntp//rss | 4 Comments

Smaller, Faster, Better Orange – quickSub 0.2

Posted on August 27, 2003 by jason

quickSub 0.2 has just been released. It’s now a considerably smaller script, with some minor re-design, links to more aggregators and newsreaders, and a [What is this?] link for people wondering what is the meaning of these strange orange XML buttons…

Posted in quickSub | Comments Off

Being Vocal – The Commandments of Voice User Interface Design

Posted on August 22, 2003 by jason

In a previous job, my focus was on wireless and voice technologies. I architected a multi-modal application platform, whose initial purpose was to support Personal Information Management across a disparate range of interaction devices. At that time I immersed myself in the world of Voice User Interface design – an area not only driven by technology, but also by psychology. In fact, the dominant factors in producing a successful voice interface are quite distant from the underlying technology. Key factors of persona, dialog flow, and how to create an engaging user experience derive a lot from human psychology.

Where am I going with this, you may well ask? Well, during some downtime at the end of last year, I put together a collection of notes that I named the “Ten Commandments of Voice User Interface Design”. These notes were a result of my education process; a process which included a diverse range of reading, attendance at some of the more popular Voice and Speech technology conferences, along with capturing my own personal experiences working with the team to build our VUI. My goal for the notes back then had been to produce a nice formalized document, turn it into a PDF, then let it loose on the web. Well, that never happened. Move forward 9 months: I was sorting through some old files on my machine, and happened to come across this same document. Therefore, rather than letting my (unedited) effort go to waste, I’ve decided to publish it here, in its raw form, for your consumption.

Feedback and comments are most welcome. Hopefully for people new to the world of VUI design, this’ll provide some interesting pointers on the road to successful Voice User Interface design.

Usability testing – it’s not a big phrase, and you need to embrace it from the beginning of any voice project.
If the client has an existing Interactive Voice Response (IVR) system, research it, and ask the users what could be or should be done better.
The Wizard of Oz isn’t only a classic movie, but also a testing methodology through which you can tell your app is on the right road to Kansas. (e.g. SUEDE)
Real testing with speech recognition doesn’t require a full production application: Prototype your dialogs and grammars using VoiceXML, and dummy up the back-end data. Your users will still have a realistic experience without you having to build the whole application.
It’s not only what the test users say… make sure that when you’re observing test users running tasks on your application that you do not only listen to them, but watch them as well. Recording test sessions on Closed-Circuit Televison may provide feedback that cannot be garnered from audio recordings alone.
Record what the users think happened, and analyze what actually did happen. Post-task questionnaires are a great way of collecting a user’s feedback on how they thought a task progressed. Capturing call flows, utterances, and system error feedback is also a great way to see how the call actually went. Use the two together to improve your design. Questionnaires provide quick feedback to high-level managers about how their potential users think the system is progressing. Call analysis provides system designers with great feedback on how users are reacting to the system.
Give users the opportunity to provide free-form feedback. A couple of ways to do this:
- In a questionnaire, offer the user some space for their general comments about the system. You may capture useful data here that is outside the scope of your set of questions.
- Where the application permits, give the user a chance to record a feedback message at the end of their call. This may have a bias, as only those users who get to that stage, or chose the menu option will be exposed to this process – but this is still a useful channel for feedback
Application evaluation is an ongoing process, and usability evaluation is an ongoing process. Once the system reaches pilot, and, even, production status, there is still great room for improvement. Implement review processes that capture both user feedback and sample interactions against the system. Perform grammar reviews to ensure your grammars capture the users’ standard range of commands.
Remember: You are guiding the user. Do not give the user the impression the system can do more than it actually is capable. Be very careful with open-ended questions (“What can I do for you today?”) unless you have a well-defined grammar, and some great back-tracking logic. Also, open-ended questions place the onus on the caller to say the right thing. You can end up with quickly disappointed users unless your grammars are great or your error handling is appropriate.
Error handling is crucial. If a user does not understand their options, or is distracted, you need to negotiate with them their status, what their options are, and what they can do next.
It is not only what the user hears the system saying… but it is also the non-speech audio that counts. Audible cues can provide great support to users in providing them understanding about where they are in the system (landmarks), or when the system is expecting input. Advanced users can be prompted about barge-in capability through use of a simple alerting tone.
Persona does matter – survey results have indicated that different voices can instill different feelings within the users of a system. As part of your usability testing, try out a few dialogs with different personas, and record characteristic preferences about each of the personas.
The terminology that management uses may never be the same as the terminology used by customers. Build the system for the people who will use it – the customers! – and make sure that your grammars are built with their language-set in mind.
Customer Service Representatives (CSRs) are a fountain of knowledge. In an existing environment, monitoring of the most frequent activities provides a great source of ideas for automation. In addition, CSRs can provide information about user information patterns: the ways in which users usually provide information (e.g. when booking a travel ticket, always providing the destination first, then usually being prompted for the point of origination).
Map out your users mental model. This process is key to developing a system that is compatible with your users. Activities include card sorting – writing out various activities intended to be offered by your application, and getting the users to sort them into groups. Get the users to label those groups – this may provide a valuable insight into how users view your organization, and what they would expect of an interactive system. Know as much about the target users of the system before you start to design the application. This model is crucial.
Be careful with TTS: Test your users’ acceptance levels, define boundary points where usage is acceptable. Determine migration plans where TTS prompts can be replaced with recorded audio. Test in conjunction with your personas to ensure no detrimental effects. Don’t forget that there are many regionalized TTS engines out there, so if TTS has to be used, try and find a match that fits well with your users.
Test users do not exist in your company, or if you’re a consulting organization building a VUI for a client, test users do not exist within your client’s company or your own. Whilst some of those people may be customers as well as employees, a degree of separation ensures that the results are unaffected by knowledge of corporate structure of terminology. i.e. get the results of the average customer.
Make sure your users are aware of universal commands, especially ‘Help’, and the ability to return to a ‘Main Menu’. This should be an integral part of any error management / backtracking system. Do not place users in a state where they feel they cannot escape.
0 is typically always used to drop through to the operator. If your users do drop through to the operator, consider capturing the reason why this happened. Or, at least, capture enough information to review why the user did not proceed in using the speech interface. Again, great usability feedback.
When getting users to test the system, task ordering can be important. While some randomization (or controlled randomization – Latin Square Design) is a good thing, it may be worth keeping some tasks to the end. Making the task to drop down to the operator the first task may have a serious affect on the rest of your results.
Avoid cognitive overload – guide your users as necessary – don’t make them forget what they’re supposed to be doing, or what options they have.
Start testing early. Real early. See Wizard of Oz.
Voice verification can offer a high security environment, without requiring the user to say lots of information. Think about the potential for up-selling/cross-selling, extra personalization, and reduction of fraudulent activity.
When testing new systems with existing customers, considering setting up ‘bogus’ accounts – users will feel more comfortable knowing that they’re testing without their personal data being exposed to third parties.
The world is a wireless place – make sure that when you are testing, you are not only using people on landlines, but from a wide range of sources. Have a fair share of people calling from landlines (corded and cordless phones), from cellphones, and also, where appropriate, from overseas. Make sure that the application works right, and also make sure that the speech recognition rate is acceptable.
Diagramming applications (e.g. Visio) are your friend. When mapping out voice dialog flows, in addition to using a standard white board, why not project Visio onto the board using a projector. You can quickly capture and rearrange flows during design meetings, and instantly have them in a distributable form.
Too much of something can be a bad thing – while you may want to offer every single one of your services through a voice interface, that may be a bad idea. Too many options can lead to a very confusing system to the user, especially if only 20% of the choices are used 80% of the time. Look to eliminate complex, low volume functionality where appropriate.
What’s the time? Well, if it’s Friday night, it might actually be Saturday morning. Watch out for the 12am timing window. A calendar classes tomorrow as beginning at 12am. Most humans class it as when they wake up. If you’re accepting times and relative dates from users, factor this into your design, and test out your assumptions in your usability tasks.
Don’t be negative about errors, especially when it’s your users who experience them. Make an error scenario (a no match, no input) a positive experience, give the user reinforcing information about what is expected of them at that stage, and the available options.
While it’s good to confirm, it isn’t good to confirm everything one step at a time – unnecessary confirmation could double the length of your user’s dialog. Use features such as n-best and stop lists to be intelligent about interpreting input, then ask for confirmation where necessary.
Usability is great, but unless you do something with the results, it is worthless. Usability testing and the design process should be intrinsically linked. Without one, the other is useless. Feed information from your usability testing into your design process. Retest the usability of improved designs.
Regionalization, Localization, Internationalization. Be aware of the requirements for your application to serve people from more than one location, or more than one country, speaking more than one language. This will not only affect your grammars and prompts, but could impact your dialog flow as well. Test with appropriate groups, to capture essential feedback.

I certainly cannot take credit for all the ideas mentioned here. These ideas and thoughts are derived from a collective body of research that goes back many years; research that still continues to be refined to this day. And, while I’m less involved in voice-related solutions today, I still find it a fascinating application of technology.

Finally, if you’re just getting into the world of Voice User Interfaces – and their associated technologies VoiceXML, SALT, and CCXML for Call Control – you might want to check out the following vendors who provide development communities: Voxeo, Voxpilot, BeVocal, and Tellme.

Posted in Technology | 7 Comments

Down South

Posted on August 11, 2003 by jason

I’ll be at HP World in Atlanta for the next few days. My employer, Envoy Technologies, has a booth in the Expo. If you’re in the neighborhood, pop along and say hello – we’ll be over at booth 237.

Posted in Uncategorized | Comments Off

Making feed subscribing easier…

Posted on July 25, 2003 by jason

Here’s my quick hack for the week. As a little diversion from nntp//rss, I thought I’d create a way to make it easier for people to subscribe to my RSS feed. There are various initiatives to make this process easier (e.g. Syndication Subscription Service, custom URI scheme), but I thought I’d take a different approach.

So, let me introduce to you quickSub – a quick (and some might say dirty!) way of making subscribing easier for your visitors. Just roll your mouse over the XML icon on my blog’s main page, and you’ll be instantly presented with a series of customized subscription links for popular aggregators. Click on one, and, voila – you’re subscribed. No dependencies on particular browsers, and no need to introduce a custom URI scheme (and the handlers to support that). Not that I’m saying those approaches are bad, just that this is something you can use today, until a better solution comes along.

I’m very open to feedback – any suggestions, enhancements, comments, whatever – post a message on the quickSub forum. If you’re an aggregator writer and I’ve missed you off the list, just send me an email, and I’ll add you in the next version.

Posted in Technology | 13 Comments

Reading Blogs in your newsreader? Check it out!

Posted on July 23, 2003 by jason

I’ve set up a demo server for those people who have yet to experience nntp//rss. Check out the following post to the nntp//rss forum for more information:

http://www.methodize.org/forum/viewtopic.php?t=15

By the way – this server is running a work-in-progress copy of v0.4, with a special twist. It’s running using the new MySQL support that’ll be part of this release. hsqldb is great for single user or small group deployments, but if you are considering using nntp//rss to serve a larger group of users, MySQL scales quite nicely.

Version 0.4 will allow you to use either solution – hsqldb for standard deployments, and MySQL for those that require higher performance.

p.s. The demo server is running on a trial basis – and may be subject to interruption at any time. If you can’t reach it, try again later!

Posted in nntp//rss | Comments Off

Syndic8 XML-RPC

Posted on July 20, 2003 by jason

Recently I’ve been working on integrating search functionality into nntp//rss. Syndic8.com offers up an XML-RPC API, and I thought that this would be a great way to integrate the search functionality. Feedster are also just about to release an XML-RPC API, and this too will be integrated into nntp//rss.

I’m using the syndic8.FindFeeds and syndic8.GetFeedInfo operations. FindFeeds allows you to initiate a search by supplying a search term. It returns a list of matching Feed Ids as its response. GetFeedInfo is used to retrieve the feed information, taking a list of Feed Ids as its input. It also allows you to narrow down the set of returned feed attributes, which enables optimization of the amount of data returned from Syndic8 to the XML-RPC client. Unfortunately the documentation did not seem to provide a list of the specific field names that could be specified, so, to help other people working with the API, I’ve listed below all the fields returned by GetFeedInfo.

My only current issue with the API is the fact that the FindFeeds function performs an unbounded search – I can’t seem limit it to a maximum number of results. If someone, for example, searches using the term ‘RSS’, they’ll get a list of over 15000 feed Ids. I’d like to be able to say ‘give me the first 200 matches’, which will not only result in a more compact response from Syndic8, but should also place less stress on their servers.

Field list
archivable
Categories
creator
cur_polling_interval
dataurl
date_approved
date_created
date_good_parse
date_xml_changed
description
editor
faultCode
faultString
feedid
fetchable
Geo-IP
headlines_per_day
headlines_rank
imageurl
lang_code
last_etag
last_poll_time
last_pollid
license_id
Locations
meta_scraped
origin
poll_status
publisher
ref_feedid
rss_version
s8_owner_userid
scraped
sitename
siteurl
status
toolkit
toolkit_version
views
webmaster

Posted in nntp//rss | 1 Comment

Which newsreader do you use with nntp//rss?

Posted on July 15, 2003 by jason

A quick request for nntp//rss users: I’ve created a poll on the nntp//rss message board to find out which newsreaders are used by nntp//rss users. You can find the poll at:

http://www.methodize.org/forum/viewtopic.php?t=12

If you have any tips for effective nntp//rss-enabled news reading within your newsreader, please feel free to add them to the topic.

Update: I’ve switched to different polling software that supports anonymous voting. The poll is now located at:
http://www.methodize.org/poll/newsreaderpoll.php

Update 2: Poll is now on the main nntp//rss page: (ok, this is the last change!)
http://www.methodize.org/nntprss

Posted in nntp//rss | 4 Comments

nntp//rss and Echo

Posted on June 27, 2003 by jason

Just a quick note to say that I’m closely watching the progress of the Echo project. I’m interested to see what transpires from the discussions, especially from a content syndication perspective. Once the spec becomes formalized, I’ll look to include support for aggregation and publication, alongside RSS and the existing blog publishing APIs, within nntp//rss.

Posted in Uncategorized | 2 Comments