admin
Karma: 98
|
Re:betaRC first test - 2010/06/10 16:52
Thanks for the feedback 
I'm not sure you were making a mistake, but it's quite a tricky one, your example of Ak-Ki-Do.
The difference is that with the regular expression used in 2.6 hyphen is not counted as a word character, and therefore it gets split into three separate words.
In 2.7 a different technique is used, and the text is split on anything that is white space or a punctuation character. It seems that hyphen is NOT a punctuation character, and it certainly isn't white space. So Ak-Ki-Do is a single word on that principle.
I had to think about that for a while, but it probably does make sense. On reflection, it does seem better that hyphenated words should be treated as whole words rather than split into their constituent parts. At least as a general principle.
Unfortunately, no scheme will work ideally in every case - language is just too varied for that!
Martin Brampton aka Counterpoint http://aliro.org http://black-sheep-research.com |