'From Squeak3.5 of ''11 April 2003'' [latest update: #5180] on 24 August 2003 at 3:12:56 pm'! "Change Set: IntlChars-ar-dgd Date: 24 August 2003 Author: Andreas Raab Enable the use of international characters such as Umlauts. Most of the work done by Andreas. Some small fixes by Diego Gomez Deck Diego's work: - call to String initialize in the postscript. - some small fixes in Character class>>initializeClassificationTable for Spanish letters These pairs of upper/lower letters need checking from non-spanish/non-german/non-english speakers. ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) WARNING: Change set has been manually re-arranged. Take care when trying to file it out again. The definition of Character class>>initialize, Character class>>initializeClassificationTable and an evaluation of 'Character initialize' have to be the first steps after the Character class redefinition. " ! Magnitude subclass: #Character instanceVariableNames: 'value ' classVariableNames: 'CharacterTable ClassificationTable LetterBits LowercaseBit UppercaseBit ' poolDictionaries: '' category: 'Collections-Text'! !Character class methodsFor: 'class initialization' stamp: 'dgd 8/24/2003 14:47'! initialize "Create the table of unique Characters." self initializeClassificationTable! ! !Character class methodsFor: 'class initialization' stamp: 'dgd 8/24/2003 15:10'! initializeClassificationTable " Initialize the classification table. The classification table is a compact encoding of upper and lower cases of characters with - bits 0-7: The lower case value of this character. - bits 8-15: The upper case value of this character. - bit 16: lowercase bit (e.g., isLowercase == true) - bit 17: uppercase bit (e.g., isUppercase == true) " | ch1 ch2 | LowercaseBit := 1 bitShift: 16. UppercaseBit := 1 bitShift: 17. "Initialize the letter bits (e.g., isLetter == true)" LetterBits := LowercaseBit bitOr: UppercaseBit. ClassificationTable := Array new: 256. "Initialize the defaults (neither lower nor upper case)" 0 to: 255 do:[:i| ClassificationTable at: i+1 put: (i bitShift: 8) + i. ]. "Initialize character pairs (upper-lower case)" #( "Basic roman" ($A $a) ($B $b) ($C $c) ($D $d) ($E $e) ($F $f) ($G $g) ($H $h) ($I $i) ($J $j) ($K $k) ($L $l) ($M $m) ($N $n) ($O $o) ($P $p) ($Q $q) ($R $r) ($S $s) ($T $t) ($U $u) ($V $v) ($W $w) ($X $x) ($Y $y) ($Z $z) "International" ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) "International - Spanish" ($ $) ($ $) ($ $) ($ $) "International - PLEASE CHECK" ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ($ $) ) do:[:pair| ch1 := pair first asciiValue. ch2 := pair last asciiValue. ClassificationTable at: ch1+1 put: (ch1 bitShift: 8) + ch2 + UppercaseBit. ClassificationTable at: ch2+1 put: (ch1 bitShift: 8) + ch2 + LowercaseBit. ]. "Initialize a few others for which we only have lower case versions." #($ $ $ $) do:[:char| ch1 := char asciiValue. ClassificationTable at: ch1+1 put: (ch1 bitShift: 8) + ch1 + LowercaseBit. ]. ! ! Character initialize! !Character methodsFor: 'testing' stamp: 'dgd 8/24/2003 14:50'! isLetter "Answer whether the receiver is a letter." ^ (ClassificationTable at: value + 1) anyMask: LetterBits! ! !Character methodsFor: 'testing' stamp: 'dgd 8/24/2003 14:51'! isLowercase "Answer whether the receiver is a lowercase letter. (The old implementation answered whether the receiver is not an uppercase letter.)" ^ ((ClassificationTable at: value + 1) bitAnd: LowercaseBit) = LowercaseBit! ! !Character methodsFor: 'testing' stamp: 'dgd 8/24/2003 14:52'! isSafeForHTTP "whether a character is 'safe', or needs to be escaped when used, eg, in a URL" ^ value < 128 and: [self isAlphaNumeric or: ['.~-_' includes: self]]! ! !Character methodsFor: 'testing' stamp: 'dgd 8/24/2003 14:52'! isUppercase "Answer whether the receiver is an uppercase letter. (The old implementation answered whether the receiver is not a lowercase letter.)" ^ ((ClassificationTable at: value + 1) bitAnd: UppercaseBit) = UppercaseBit! ! !Character methodsFor: 'converting' stamp: 'dgd 8/24/2003 14:53'! asLowercase "If the receiver is uppercase, answer its matching lowercase Character." ^ Character value: ((ClassificationTable at: value + 1) bitAnd: 255)! ! !Character methodsFor: 'converting' stamp: 'dgd 8/24/2003 14:53'! asUppercase "If the receiver is lowercase, answer its matching uppercase Character." ^ Character value: (((ClassificationTable at: value + 1) bitShift: -8) bitAnd: 255)! ! !Scanner class methodsFor: 'class initialization' stamp: 'dgd 8/24/2003 14:55'! initialize | newTable | newTable _ Array new: 256 withAll: #xBinary. "default" newTable atAll: #(9 10 12 13 32 ) put: #xDelimiter. "tab lf ff cr space" newTable atAll: ($0 asciiValue to: $9 asciiValue) put: #xDigit. 1 to: 255 do: [:index | (Character value: index) isLetter ifTrue: [newTable at: index put: #xLetter]]. newTable at: 30 put: #doIt. newTable at: $" asciiValue put: #xDoubleQuote. newTable at: $# asciiValue put: #xLitQuote. newTable at: $$ asciiValue put: #xDollar. newTable at: $' asciiValue put: #xSingleQuote. newTable at: $: asciiValue put: #xColon. newTable at: $( asciiValue put: #leftParenthesis. newTable at: $) asciiValue put: #rightParenthesis. newTable at: $. asciiValue put: #period. newTable at: $; asciiValue put: #semicolon. newTable at: $[ asciiValue put: #leftBracket. newTable at: $] asciiValue put: #rightBracket. newTable at: ${ asciiValue put: #leftBrace. newTable at: $} asciiValue put: #rightBrace. newTable at: $^ asciiValue put: #upArrow. newTable at: $_ asciiValue put: #leftArrow. newTable at: $| asciiValue put: #verticalBar. TypeTable _ newTable "bon voyage!!" "Scanner initialize"! ! "Postscript: call to String and Scanner initialization to reflect the new category of letters " String initialize. Scanner initialize. !