'From Squeak3.8gamma of ''24 November 2004'' [latest update: #6643] on 10 April 2005 at 7:16:25 pm'! "Change Set: UnifyMultiChar Date: 9 April 2005 Author: Andreas Raab This change sets obsoletes MultiCharacter by folding it with Character. Note that it does not remove MultiCharacter itself since this needs to be done so that all instances are converted to regular characters before we remove it."! !Character commentStamp: 'ar 4/9/2005 22:35' prior: 0! I represent a character by storing its associated Unicode. The first 256 characters are created uniquely, so that all instances of latin1 characters ($R, for example) are identical. The code point is based on Unicode. Since Unicode is 21-bit wide character set, we have several bits available for other information. As the Unicode Standard states, a Unicode code point doesn't carry the language information. This is going to be a problem with the languages so called CJK (Chinese, Japanese, Korean. Or often CJKV including Vietnamese). Since the characters of those languages are unified and given the same code point, it is impossible to display a bare Unicode code point in an inspector or such tools. To utilize the extra available bits, we use them for identifying the languages. Since the old implementation uses the bits to identify the character encoding, the bits are sometimes called "encoding tag" or neutrally "leading char", but the bits rigidly denotes the concept of languages. The other languages can have the language tag if you like. This will help to break the large default font (font set) into separately loadable chunk of fonts. However, it is open to the each native speakers and writers to decide how to define the character equality, since the same Unicode code point may have different language tag thus simple #= comparison may return false. I represent a character by storing its associated ASCII code (extended to 256 codes). My instances are created uniquely, so that all instances of a character ($R, for example) are identical.! !AbstractString class methodsFor: 'instance creation' stamp: 'ar 4/9/2005 21:50'! with: aCharacter | newCollection | aCharacter asInteger < 256 ifTrue:[newCollection _ String new: 1] ifFalse:[newCollection _ MultiString new: 1]. newCollection at: 1 put: aCharacter. ^newCollection! ! !AbstractString class methodsFor: 'initialization' stamp: 'ar 4/9/2005 22:37'! initialize "self initialize" | order | AsciiOrder _ (0 to: 255) as: ByteArray. CaseInsensitiveOrder _ AsciiOrder copy. ($a to: $z) do: [:c | CaseInsensitiveOrder at: c asciiValue + 1 put: (CaseInsensitiveOrder at: c asUppercase asciiValue +1)]. "Case-sensitive compare sorts space, digits, letters, all the rest..." CaseSensitiveOrder _ ByteArray new: 256 withAll: 255. order _ -1. ' 0123456789' do: "0..10" [:c | CaseSensitiveOrder at: c asciiValue + 1 put: (order _ order+1)]. ($a to: $z) do: "11-64" [:c | CaseSensitiveOrder at: c asUppercase asciiValue + 1 put: (order _ order+1). CaseSensitiveOrder at: c asciiValue + 1 put: (order _ order+1)]. 1 to: CaseSensitiveOrder size do: [:i | (CaseSensitiveOrder at: i) = 255 ifTrue: [CaseSensitiveOrder at: i put: (order _ order+1)]]. order = 255 ifFalse: [self error: 'order problem']. "a table for translating to lower case" LowercasingTable _ String withAll: (Character allByteCharacters collect: [:c | c asLowercase]). "a table for translating to upper case" UppercasingTable _ String withAll: (Character allByteCharacters collect: [:c | c asUppercase]). "a table for testing tokenish (for fast numArgs)" Tokenish _ String withAll: (Character allByteCharacters collect: [:c | c tokenish ifTrue: [c] ifFalse: [$~]]). "CR and LF--characters that terminate a line" CSLineEnders _ CharacterSet empty. CSLineEnders add: Character cr. CSLineEnders add: Character lf. "separators and non-separators" CSSeparators _ CharacterSet separators. CSNonSeparators _ CSSeparators complement.! ! !Character methodsFor: 'comparing' stamp: 'ar 4/9/2005 21:48'! = aCharacter "Primitive. Answer true if the receiver and the argument are the same object (have the same object pointer) and false otherwise. Optional. See Object documentation whatIsAPrimitive." ^ self == aCharacter or:[ aCharacter isCharacter and: [self asciiValue = aCharacter asciiValue]]! ! !Character methodsFor: 'testing' stamp: 'ar 4/9/2005 22:15'! isUnicode ^ ((EncodedCharSet charsetAt: self leadingChar) isKindOf: LanguageEnvironment class).! ! !Character methodsFor: 'printing' stamp: 'ar 4/9/2005 21:53'! hex ^value hex! ! !Character methodsFor: 'printing' stamp: 'ar 4/9/2005 22:30'! storeBinaryOn: aStream "Store the receiver on a binary (file) stream" value < 256 ifTrue:[aStream basicNextPut: self] ifFalse:[Stream nextInt32Put: value].! ! !Character methodsFor: 'converting' stamp: 'ar 4/9/2005 21:51'! asUnicode | table charset v | self leadingChar = 0 ifTrue: [^ value]. charset _ EncodedCharSet charsetAt: self leadingChar. charset isCharset ifFalse: [^ self charCode]. table _ charset ucsTable. table isNil ifTrue: [^ 16rFFFD]. v _ table at: self charCode + 1. v = -1 ifTrue: [^ 16rFFFD]. ^ v. ! ! !Character methodsFor: 'converting' stamp: 'ar 4/9/2005 22:25'! asUnicodeChar "@@@ FIXME: Make this use asUnicode and move it to its lonely sender @@@" | table charset v | self leadingChar = 0 ifTrue: [^ value]. charset _ EncodedCharSet charsetAt: self leadingChar. charset isCharset ifFalse: [^ self]. table _ charset ucsTable. table isNil ifTrue: [^ Character value: 16rFFFD]. v _ table at: self charCode + 1. v = -1 ifTrue: [^ Character value: 16rFFFD]. ^ Character leadingChar: charset unicodeLeadingChar code: v.! ! !Character methodsFor: 'private' stamp: 'ar 4/9/2005 22:18'! setValue: newValue value ifNotNil:[^self error:'Characters are immutable']. value _ newValue.! ! !Character class methodsFor: 'instance creation' stamp: 'ar 4/9/2005 22:36'! allByteCharacters "Answer all the characters that can be encoded in a byte" ^ (0 to: 255) collect: [:v | Character value: v] ! ! !Character class methodsFor: 'instance creation' stamp: 'ar 4/9/2005 22:37'! allCharacters "This name is obsolete since only the characters that will fit in a byte can be queried" ^self allByteCharacters ! ! !Character class methodsFor: 'instance creation' stamp: 'ar 4/9/2005 22:24'! leadingChar: leadChar code: code code >= 16r400000 ifTrue: [ self error: 'code is out of range'. ]. leadChar >= 256 ifTrue: [ self error: 'lead is out of range'. ]. ^self value: (leadChar bitShift: 22) + code.! ! !Character class methodsFor: 'instance creation' stamp: 'ar 4/9/2005 22:19'! value: anInteger "Answer the Character whose value is anInteger." anInteger > 255 ifTrue: [^self basicNew setValue: anInteger]. ^ CharacterTable at: anInteger + 1. ! ! !CharacterSet methodsFor: 'collection ops' stamp: 'ar 4/9/2005 22:37'! do: aBlock "evaluate aBlock with each character in the set" Character allByteCharacters do: [ :c | (self includes: c) ifTrue: [ aBlock value: c ] ] ! ! !CompoundTextConverter methodsFor: 'conversion' stamp: 'ar 4/9/2005 22:25'! nextFromStream: aStream | character character2 size leadingChar offset result | aStream isBinary ifTrue: [^ aStream basicNext]. character _ aStream basicNext. character ifNil: [^ nil]. character == Character escape ifTrue: [ self parseShiftSeqFromStream: aStream. character _ aStream basicNext. character ifNil: [^ nil]]. character asciiValue < 128 ifTrue: [ size _ state g0Size. leadingChar _ state g0Leading. offset _ 16r21. ] ifFalse: [ size _state g1Size. leadingChar _ state g1Leading. offset _ 16rA1. ]. size = 1 ifTrue: [ leadingChar = 0 ifTrue: [^ character] ifFalse: [^ Character leadingChar: leadingChar code: character asciiValue] ]. size = 2 ifTrue: [ character2 _ aStream basicNext. character2 ifNil: [^ nil. "self errorMalformedInput"]. character _ character asciiValue - offset. character2 _ character2 asciiValue - offset. result _ Character leadingChar: leadingChar code: character * 94 + character2. ^ result asUnicodeChar. "^ self toUnicode: result" ]. self error: 'unsupported encoding'. ! ! !CompoundTextConverter methodsFor: 'conversion' stamp: 'ar 4/9/2005 22:28'! nextPut: aCharacter toStream: aStream | ascii leadingChar class | aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream]. aCharacter isUnicode ifTrue: [ class _ (EncodedCharSet charsetAt: aCharacter leadingChar) traditionalCharsetClass. ascii _ (class charFromUnicode: aCharacter asUnicode) charCode. leadingChar _ class leadingChar. ] ifFalse: [ ascii _ aCharacter charCode. leadingChar _ aCharacter leadingChar. ]. self nextPutValue: ascii toStream: aStream withShiftSequenceIfNeededForLeadingChar: leadingChar. ! ! !CP1250TextConverter methodsFor: 'conversion' stamp: 'ar 4/9/2005 22:28'! nextPut: aCharacter toStream: aStream aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream]. aCharacter charCode < 128 ifTrue: [ aStream basicNextPut: aCharacter. ] ifFalse: [ aStream basicNextPut: ((Character value: (self fromSqueak: aCharacter) charCode)). ]. ! ! !CP1250TextConverter methodsFor: 'private' stamp: 'ar 4/9/2005 22:24'! toSqueak: char | value | value _ char charCode. value < 129 ifTrue: [^ char]. value > 255 ifTrue: [^ char]. ^ Character leadingChar: Latin2Environment leadingChar code: (#( 16r0081 16r201A 16r0083 16r201E 16r2026 16r2020 16r2021 16r0088 16r2030 16r0160 16r2039 16r015A 16r0164 16r017D 16r0179 16r0090 16r2018 16r2019 16r201C 16r201D 16r2022 16r2013 16r2014 16r0098 16r2122 16r0161 16r203A 16r015B 16r0165 16r017E 16r017A 16r00A0 16r02C7 16r02D8 16r0141 16r00A4 16r0104 16r00A6 16r00A7 16r00A8 16r00A9 16r015E 16r00AB 16r00AC 16r00AD 16r00AE 16r017B 16r00B0 16r00B1 16r02DB 16r0142 16r00B4 16r00B5 16r00B6 16r00B7 16r00B8 16r0105 16r015F 16r00BB 16r013D 16r02DD 16r013E 16r017C 16r0154 16r00C1 16r00C2 16r0102 16r00C4 16r0139 16r0106 16r00C7 16r010C 16r00C9 16r0118 16r00CB 16r011A 16r00CD 16r00CE 16r010E 16r0110 16r0143 16r0147 16r00D3 16r00D4 16r0150 16r00D6 16r00D7 16r0158 16r016E 16r00DA 16r0170 16r00DC 16r00DD 16r0162 16r00DF 16r0155 16r00E1 16r00E2 16r0103 16r00E4 16r013A 16r0107 16r00E7 16r010D 16r00E9 16r0119 16r00EB 16r011B 16r00ED 16r00EE 16r010F 16r0111 16r0144 16r0148 16r00F3 16r00F4 16r0151 16r00F6 16r00F7 16r0159 16r016F 16r00FA 16r0171 16r00FC 16r00FD 16r0163 16r02D9 ) at: (value - 129 + 1)). ! ! !CP1253TextConverter methodsFor: 'private' stamp: 'ar 4/9/2005 22:24'! toSqueak: char | value | value _ char charCode. value < 128 ifTrue: [^ char]. value > 255 ifTrue: [^ char]. ^ Character leadingChar: GreekEnvironment leadingChar code: (#( 16r20AC 16rFFFD 16r201A 16r0192 16r201E 16r2026 16r2020 16r2021 16rFFFD 16r2030 16rFFFD 16r2039 16rFFFD 16rFFFD 16rFFFD 16rFFFD 16rFFFD 16r2018 16r2019 16r201C 16r201D 16r2022 16r2013 16r2014 16rFFFD 16r2122 16rFFFD 16r203A 16rFFFD 16rFFFD 16rFFFD 16rFFFD 16r00A0 16r0385 16r0386 16r00A3 16r00A4 16r00A5 16r00A6 16r00A7 16r00A8 16r00A9 16rFFFD 16r00AB 16r00AC 16r00AD 16r00AE 16r2015 16r00B0 16r00B1 16r00B2 16r00B3 16r0384 16r00B5 16r00B6 16r00B7 16r0388 16r0389 16r038A 16r00BB 16r038C 16r00BD 16r038E 16r038F 16r0390 16r0391 16r0392 16r0393 16r0394 16r0395 16r0396 16r0397 16r0398 16r0399 16r039A 16r039B 16r039C 16r039D 16r039E 16r039F 16r03A0 16r03A1 16rFFFD 16r03A3 16r03A4 16r03A5 16r03A6 16r03A7 16r03A8 16r03A9 16r03AA 16r03AB 16r03AC 16r03AD 16r03AE 16r03AF 16r03B0 16r03B1 16r03B2 16r03B3 16r03B4 16r03B5 16r03B6 16r03B7 16r03B8 16r03B9 16r03BA 16r03BB 16r03BC 16r03BD 16r03BE 16r03BF 16r03C0 16r03C1 16r03C2 16r03C3 16r03C4 16r03C5 16r03C6 16r03C7 16r03C8 16r03C9 16r03CA 16r03CB 16r03CC 16r03CD 16r03CE 16rFFFD ) at: (value - 128 + 1)). ! ! !EncodedCharSet class methodsFor: 'class methods' stamp: 'ar 4/9/2005 22:30'! charFromUnicode: unicode | table index | unicode < 256 ifTrue: [^ Character value: unicode]. table _ self ucsTable. index _ table indexOf: unicode. index = 0 ifTrue: [ ^ nil. ]. ^ Character leadingChar: self leadingChar code: index - 1. ! ! !EUCTextConverter methodsFor: 'conversion' stamp: 'ar 4/9/2005 22:25'! nextFromStream: aStream | character1 character2 offset value1 value2 nonUnicodeChar | aStream isBinary ifTrue: [^ aStream basicNext]. character1 _ aStream basicNext. character1 isNil ifTrue: [^ nil]. character1 asciiValue <= 127 ifTrue: [^ character1]. character2 _ aStream basicNext. character2 = nil ifTrue: [^ nil]. offset _ 16rA1. value1 _ character1 asciiValue - offset. value2 _ character2 asciiValue - offset. (value1 < 0 or: [value1 > 93]) ifTrue: [^ nil]. (value2 < 0 or: [value2 > 93]) ifTrue: [^ nil]. nonUnicodeChar _ Character leadingChar: self leadingChar code: value1 * 94 + value2. ^ Character leadingChar: self languageEnvironment leadingChar code: nonUnicodeChar asUnicode. ! ! !EUCTextConverter methodsFor: 'conversion' stamp: 'ar 4/9/2005 22:27'! nextPut: aCharacter toStream: aStream | value leadingChar nonUnicodeChar value1 value2 | aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream]. value _ aCharacter charCode. leadingChar _ aCharacter leadingChar. (leadingChar = 0 and: [value < 128]) ifTrue: [ aStream basicNextPut: (Character value: value). ^ aStream ]. (128 <= value and: [value < 256]) ifTrue: [^ aStream]. aCharacter isUnicode ifTrue: [ nonUnicodeChar _ self nonUnicodeClass charFromUnicode: value. ] ifFalse: [ nonUnicodeChar _(Character value: value) ]. nonUnicodeChar ifNotNil: [ value _ nonUnicodeChar charCode. value1 _ value // 94 + 161. value2 _ value \\ 94 + 161. aStream basicNextPut: (Character value: value1). aStream basicNextPut: (Character value: value2). ^ aStream ] ! ! !Integer methodsFor: 'converting' stamp: 'ar 4/9/2005 22:31'! asCharacter "Answer the Character whose value is the receiver." ^Character value: self! ! !ISO88592TextConverter methodsFor: 'conversion' stamp: 'ar 4/9/2005 22:28'! nextPut: aCharacter toStream: aStream aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream]. aCharacter charCode < 128 ifTrue: [ aStream basicNextPut: aCharacter. ] ifFalse: [ aStream basicNextPut: ((Character value: (self fromSqueak: aCharacter) charCode)). ]. ! ! !ISO88592TextConverter methodsFor: 'private' stamp: 'ar 4/9/2005 22:30'! toSqueak: char | value | value _ char charCode. value < 160 ifTrue: [^ char]. value > 255 ifTrue: [^ char]. ^ Character leadingChar: Latin2Environment leadingChar code: (#( 16r00A0 16r0104 16r02D8 16r0141 16r00A4 16r013D 16r015A 16r00A7 16r00A8 16r0160 16r015E 16r0164 16r0179 16r00AD 16r017D 16r017B 16r00B0 16r0105 16r02DB 16r0142 16r00B4 16r013E 16r015B 16r02C7 16r00B8 16r0161 16r015F 16r0165 16r017A 16r02DD 16r017E 16r017C 16r0154 16r00C1 16r00C2 16r0102 16r00C4 16r0139 16r0106 16r00C7 16r010C 16r00C9 16r0118 16r00CB 16r011A 16r00CD 16r00CE 16r010E 16r0110 16r0143 16r0147 16r00D3 16r00D4 16r0150 16r00D6 16r00D7 16r0158 16r016E 16r00DA 16r0170 16r00DC 16r00DD 16r0162 16r00DF 16r0155 16r00E1 16r00E2 16r0103 16r00E4 16r013A 16r0107 16r00E7 16r010D 16r00E9 16r0119 16r00EB 16r011B 16r00ED 16r00EE 16r010F 16r0111 16r0144 16r0148 16r00F3 16r00F4 16r0151 16r00F6 16r00F7 16r0159 16r016F 16r00FA 16r0171 16r00FC 16r00FD 16r0163 16r02D9 ) at: (value - 160 + 1)). ! ! !ISO88597TextConverter methodsFor: 'conversion' stamp: 'ar 4/9/2005 22:28'! nextPut: aCharacter toStream: aStream aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream]. aCharacter charCode < 128 ifTrue: [ aStream basicNextPut: aCharacter. ] ifFalse: [ aStream basicNextPut: ((Character value: (self fromSqueak: aCharacter) charCode)). ]. ! ! !ISO88597TextConverter methodsFor: 'private' stamp: 'ar 4/9/2005 22:30'! toSqueak: char | value | value _ char charCode. value < 160 ifTrue: [^ char]. value > 255 ifTrue: [^ char]. ^ Character leadingChar: GreekEnvironment leadingChar code: (#( 16r00A0 16r2018 16r2019 16r00A3 16r20AC 16r20AF 16r00A6 16r00A7 16r00A8 16r00A9 16r037A 16r00AB 16r00AC 16r00AD 16rFFFD 16r2015 16r00B0 16r00B1 16r00B2 16r00B3 16r0384 16r0385 16r0386 16r00B7 16r0388 16r0389 16r038A 16r00BB 16r038C 16r00BD 16r038E 16r038F 16r0390 16r0391 16r0392 16r0393 16r0394 16r0395 16r0396 16r0397 16r0398 16r0399 16r039A 16r039B 16r039C 16r039D 16r039E 16r039F 16r03A0 16r03A1 16rFFFD 16r03A3 16r03A4 16r03A5 16r03A6 16r03A7 16r03A8 16r03A9 16r03AA 16r03AB 16r03AC 16r03AD 16r03AE 16r03AF 16r03B0 16r03B1 16r03B2 16r03B3 16r03B4 16r03B5 16r03B6 16r03B7 16r03B8 16r03B9 16r03BA 16r03BB 16r03BC 16r03BD 16r03BE 16r03BF 16r03C0 16r03C1 16r03C2 16r03C3 16r03C4 16r03C5 16r03C6 16r03C7 16r03C8 16r03C9 16r03CA 16r03CB 16r03CC 16r03CD 16r03CE 16rFFFD ) at: (value - 160 + 1)). ! ! !JapaneseEnvironment class methodsFor: 'language methods' stamp: 'ar 4/9/2005 22:31'! fromJISX0208String: aString ^ aString collect: [:each | Character leadingChar: JapaneseEnvironment leadingChar code: (each asUnicode)]. ! ! !JISX0208 class methodsFor: 'class methods' stamp: 'ar 4/9/2005 22:31'! charAtKuten: anInteger | a b | a _ anInteger \\ 100. b _ anInteger // 100. (a > 94) | (b > 94) ifTrue: [ self error: 'character code is not valid'. ]. ^ Character leadingChar: self leadingChar code: ((b - 1) * 94) + a - 1. ! ! !MacRomanTextConverter methodsFor: 'conversion' stamp: 'ar 4/10/2005 16:05'! nextPut: aCharacter toStream: aStream aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream]. aStream basicNextPut: aCharacter squeakToMac. ! ! !MultiString methodsFor: 'accessing' stamp: 'ar 4/10/2005 17:14'! at: index "Answer the Character stored in the field of the receiver indexed by the argument." ^ Character value: (self wordAt: index). ! ! !MultiString methodsFor: 'private' stamp: 'ar 4/9/2005 22:31'! mutateJISX0208StringToUnicode | c | 1 to: self size do: [:i | c _ self at: i. (c leadingChar = JISX0208 leadingChar or: [ c leadingChar = (JISX0208 leadingChar bitShift: 2)]) ifTrue: [ self basicAt: i put: (Character leadingChar: JapaneseEnvironment leadingChar code: (c asUnicode)) asciiValue. ] ]. ! ! !PositionableStream methodsFor: 'fileIn/Out' stamp: 'ar 4/9/2005 22:31'! decodeString: string andRuns: runsRaw | strm runLength runValues newString index | strm _ ReadStream on: runsRaw from: 1 to: runsRaw size. (strm peekFor: $( ) ifFalse: [^ nil]. runLength _ OrderedCollection new. [strm skipSeparators. strm peekFor: $)] whileFalse: [runLength add: (Number readFrom: strm)]. runValues _ OrderedCollection new. [strm atEnd not] whileTrue: [runValues add: (Number readFrom: strm). strm next.]. newString _ MultiString new: string size. index _ 1. runLength with: runValues do: [:length :leadingChar | index to: index + length - 1 do: [:pos | newString at: pos put: (Character leadingChar: leadingChar code: (string at: pos) charCode). ]. index _ index + length. ]. ^ newString. ! ! !ShiftJISTextConverter methodsFor: 'conversion' stamp: 'ar 4/9/2005 22:31'! nextFromStream: aStream | character1 character2 value1 value2 char1Value result | aStream isBinary ifTrue: [^ aStream basicNext]. character1 _ aStream basicNext. character1 isNil ifTrue: [^ nil]. char1Value _ character1 asciiValue. (char1Value < 16r81) ifTrue: [^ character1]. (char1Value > 16rA0 and: [char1Value < 16rE0]) ifTrue: [^ self katakanaValue: char1Value]. character2 _ aStream basicNext. character2 = nil ifTrue: [^ nil "self errorMalformedInput"]. value1 _ character1 asciiValue. character1 asciiValue >= 224 ifTrue: [value1 _ value1 - 64]. value1 _ value1 - 129 bitShift: 1. value2 _ character2 asciiValue. character2 asciiValue >= 128 ifTrue: [value2 _ value2 - 1]. character2 asciiValue >= 158 ifTrue: [ value1 _ value1 + 1. value2 _ value2 - 158 ] ifFalse: [value2 _ value2 - 64]. result _ Character leadingChar: self leadingChar code: value1 * 94 + value2. ^ self toUnicode: result ! ! !ShiftJISTextConverter methodsFor: 'conversion' stamp: 'ar 4/9/2005 22:29'! nextPut: aCharacter toStream: aStream | value leadingChar aChar | aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream]. aCharacter isUnicode ifFalse: [ aChar _ aCharacter. value _ aCharacter charCode. ] ifTrue: [ value _ aCharacter charCode. (16rFF61 <= value and: [value <= 16rFF9F]) ifTrue: [ aStream basicNextPut: (self sjisKatakanaFor: value). ^ aStream ]. aChar _ JISX0208 charFromUnicode: value. aChar ifNil: [^ aStream]. value _ aChar charCode. ]. leadingChar _ aChar leadingChar. leadingChar = 0 ifTrue: [ aStream basicNextPut: (Character value: value). ^ aStream. ]. leadingChar == self leadingChar ifTrue: [ | upper lower | upper _ value // 94 + 33. lower _ value \\ 94 + 33. upper \\ 2 == 1 ifTrue: [ upper _ upper + 1 / 2 + 112. lower _ lower + 31 ] ifFalse: [ upper _ upper / 2 + 112. lower _ lower + 125 ]. upper >= 160 ifTrue: [upper _ upper + 64]. lower >= 127 ifTrue: [lower _ lower + 1]. aStream basicNextPut: (Character value: upper). aStream basicNextPut: (Character value: lower). ^ aStream ]. ! ! !ShiftJISTextConverter methodsFor: 'private' stamp: 'ar 4/9/2005 22:31'! katakanaValue: code ^ Character leadingChar: JapaneseEnvironment leadingChar code: (#( 16rFFFD 16rFF61 16rFF62 16rFF63 16rFF64 16rFF65 16rFF66 16rFF67 16rFF68 16rFF69 16rFF6A 16rFF6B 16rFF6C 16rFF6D 16rFF6E 16rFF6F 16rFF70 16rFF71 16rFF72 16rFF73 16rFF74 16rFF75 16rFF76 16rFF77 16rFF78 16rFF79 16rFF7A 16rFF7B 16rFF7C 16rFF7D 16rFF7E 16rFF7F 16rFF80 16rFF81 16rFF82 16rFF83 16rFF84 16rFF85 16rFF86 16rFF87 16rFF88 16rFF89 16rFF8A 16rFF8B 16rFF8C 16rFF8D 16rFF8E 16rFF8F 16rFF90 16rFF91 16rFF92 16rFF93 16rFF94 16rFF95 16rFF96 16rFF97 16rFF98 16rFF99 16rFF9A 16rFF9B 16rFF9C 16rFF9D 16rFF9E 16rFF9F ) at: (code - 16r9F)). ! ! !ShiftJISTextConverter methodsFor: 'private' stamp: 'ar 4/9/2005 22:31'! toUnicode: aChar ^ Character leadingChar: JapaneseEnvironment leadingChar code: aChar asUnicode. ! ! !String methodsFor: 'accessing' stamp: 'ar 4/9/2005 22:22'! at: index put: aCharacter "Primitive. Store the Character in the field of the receiver indicated by the index. Fail if the index is not an Integer or is out of bounds, or if the argument is not a Character. Essential. See Object documentation whatIsAPrimitive." aCharacter isCharacter ifFalse:[^self errorImproperStore]. aCharacter isOctetCharacter ifFalse:[ "Convert to MultiByteString" self becomeForward: (MultiString from: self). ^self at: index put: aCharacter. ]. index isInteger ifTrue: [self errorSubscriptBounds: index] ifFalse: [self errorNonIntegerIndex]! ! !Unicode class methodsFor: 'instance creation' stamp: 'ar 4/9/2005 22:33'! charFromUnicode: uniCode ^ Character leadingChar: self leadingChar code: uniCode ! ! !Unicode class methodsFor: 'instance creation' stamp: 'ar 4/9/2005 22:33'! value: code | l | code < 256 ifTrue: [^ Character value: code]. l _ Locale currentPlatform languageEnvironment leadingChar. l = 0 ifTrue: [l _ 255]. ^ Character leadingChar: l code: code. ! ! !UTF8TextConverter methodsFor: 'conversion' stamp: 'ar 4/9/2005 22:29'! nextPut: aCharacter toStream: aStream | leadingChar nBytes mask shift ucs2code | aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream]. leadingChar _ aCharacter leadingChar. (leadingChar = 0 and: [aCharacter asciiValue < 128]) ifTrue: [ aStream basicNextPut: aCharacter. ^ aStream. ]. "leadingChar > 3 ifTrue: [^ aStream]." ucs2code _ aCharacter asUnicode. ucs2code ifNil: [^ aStream]. nBytes _ ucs2code highBit + 3 // 5. mask _ #(128 192 224 240 248 252 254 255) at: nBytes. shift _ nBytes - 1 * -6. aStream basicNextPut: (Character value: (ucs2code bitShift: shift) + mask). 2 to: nBytes do: [:i | shift _ shift + 6. aStream basicNextPut: (Character value: ((ucs2code bitShift: shift) bitAnd: 63) + 128). ]. ^ aStream. ! ! !WriteStream methodsFor: 'accessing' stamp: 'ar 4/10/2005 19:13'! nextPut: anObject "Primitive. Insert the argument at the next position in the Stream represented by the receiver. Fail if the collection of this stream is not an Array or a String. Fail if the stream is positioned at its end, or if the position is out of bounds in the collection. Fail if the argument is not of the right type for the collection. Optional. See Object documentation whatIsAPrimitive." self flag: #ByteString. ((collection class == String) and: [ anObject isCharacter and:[anObject isOctetCharacter not]]) ifTrue: [ collection _ (MultiString from: collection). ^self nextPut: anObject. ]. position >= writeLimit ifTrue: [^ self pastEndPut: anObject] ifFalse: [position _ position + 1. ^collection at: position put: anObject]! ! String class removeSelector: #multiClasses! MultiCharacter class removeSelector: #allCharacters! MultiCharacter class removeSelector: #from:! MultiCharacter class removeSelector: #leadingChar:code:! MultiCharacter class removeSelector: #value:! MultiCharacter removeSelector: #asCharacter! MultiCharacter removeSelector: #asString! MultiCharacter removeSelector: #asUnicode! MultiCharacter removeSelector: #asUnicodeChar! MultiCharacter removeSelector: #hash! MultiCharacter removeSelector: #hex! MultiCharacter removeSelector: #isoToSqueak! MultiCharacter removeSelector: #isUnicode! MultiCharacter removeSelector: #squeakToIso! MultiCharacter removeSelector: #value:! MultiCharacter removeSelector: #=! AbstractString initialize!