Register | Login
Forum Index > RTL > CharClass
Author Message
Pages: 1 2 3 4
0CodErr
Ziron Guru
(send private message)

Posts: 199
Topics: 37

Location:
[1469] - posted: 2015-01-15 12:15:32
I noticed that we can define Code:
const CHAR_LETTER     = CHAR_LOWER | CHAR_UPPER
Then no longer need Code:
const CHAR_LETTER_or_CHAR_UPPER       = CHAR_LETTER  | CHAR_UPPER;
const CHAR_LETTER_or_CHAR_LOWER       = CHAR_LETTER  | CHAR_LOWER;
And we have yet one free bit for our need's.

At now changed code is: Code:
/*
/    Character classification routines(CharClass.zir)
/    Copyright (c) 0CodErr, KolibriOS team
*/
const CHAR_DIGIT      = 1 << 0; // 1b      ; 0 bit
const CHAR_UPPER      = 1 << 1; // 10b     ; 1 bit
const CHAR_LOWER      = 1 << 2; // 100b    ; 2 bit
const CHAR_CONTROL    = 1 << 3; // 1000b   ; 3 bit
const CHAR_WHITESPACE = 1 << 4; // 10000b  ; 4 bit
const CHAR_PUNCT      = 1 << 5; // 100000b ; 5 bit
const CHAR_LETTER     = CHAR_LOWER | CHAR_UPPER;
const CHAR_CONTROL_WHITESPACE = CHAR_CONTROL | CHAR_WHITESPACE;

inline function IsDigit($symbol) {
  gp_ax = 0; al = $symbol;
  al = CHAR_TABLE[gp_ax];
  and al, CHAR_DIGIT;
  $return gp_ax;
}
inline function IsLetter($symbol) {
  gp_ax = 0; al = $symbol;
  al = CHAR_TABLE[gp_ax];
  and al, CHAR_LETTER;
  $return gp_ax;
}
inline function IsUpper($symbol) {
  gp_ax = 0; al = $symbol;
  al = CHAR_TABLE[gp_ax];
  and al, CHAR_UPPER;
  $return gp_ax;
}
inline function IsLower($symbol) {
  gp_ax = 0; al = $symbol;
  al = CHAR_TABLE[gp_ax];
  and al, CHAR_LOWER;
  $return gp_ax;
}
inline function IsControl($symbol) {
  gp_ax = 0; al = $symbol;
  al = CHAR_TABLE[gp_ax];
  and al, CHAR_CONTROL;
  $return gp_ax;
}
inline function IsWhiteSpace($symbol) {
  gp_ax = 0; al = $symbol;
  al = CHAR_TABLE[gp_ax];
  and al, CHAR_WHITESPACE;
  $return gp_ax;
}
inline function IsPunct($symbol) {
  gp_ax = 0; al = $symbol;
  al = CHAR_TABLE[gp_ax];
  and al, CHAR_PUNCT;
  $return gp_ax;
}
inline function GetCharType($symbol) {
  gp_ax = 0; al = $symbol;
  al = CHAR_TABLE[gp_ax];
  $return gp_ax;
}

char CHAR_TABLE[256] = [
// 0..15
CHAR_CONTROL,            // 00 (NUL)
CHAR_CONTROL,            // 01 (SOH)
CHAR_CONTROL,            // 02 (STX)
CHAR_CONTROL,            // 03 (ETX)
CHAR_CONTROL,            // 04 (EOT)
CHAR_CONTROL,            // 05 (ENQ)
CHAR_CONTROL,            // 06 (ACK)
CHAR_CONTROL,            // 07 (BEL)
CHAR_CONTROL,            // 08 (BS)
CHAR_CONTROL_WHITESPACE, // 09 (HT)
CHAR_CONTROL_WHITESPACE, // 0A (LF)
CHAR_CONTROL_WHITESPACE, // 0B (VT)
CHAR_CONTROL_WHITESPACE, // 0C (FF)
CHAR_CONTROL_WHITESPACE, // 0D (CR)
CHAR_CONTROL,            // 0E (SI)
CHAR_CONTROL,            // 0F (SO)
// 16..31                
CHAR_CONTROL,            // 10 (DLE)
CHAR_CONTROL,            // 11 (DC1)
CHAR_CONTROL,            // 12 (DC2)
CHAR_CONTROL,            // 13 (DC3)
CHAR_CONTROL,            // 14 (DC4)
CHAR_CONTROL,            // 15 (NAK)
CHAR_CONTROL,            // 16 (SYN)
CHAR_CONTROL,            // 17 (ETB)
CHAR_CONTROL,            // 18 (CAN)
CHAR_CONTROL,            // 19 (EM)
CHAR_CONTROL,            // 1A (SUB)
CHAR_CONTROL,            // 1B (ESC)
CHAR_CONTROL,            // 1C (FS)
CHAR_CONTROL,            // 1D (GS)
CHAR_CONTROL,            // 1E (RS)
CHAR_CONTROL,            // 1F (US)
// 32..47                
CHAR_WHITESPACE,         // 20 SPACE
CHAR_PUNCT,              // 21 !
CHAR_PUNCT,              // 22 "
CHAR_PUNCT,              // 23 #
CHAR_PUNCT,              // 24 $
CHAR_PUNCT,              // 25 %
CHAR_PUNCT,              // 26 &
CHAR_PUNCT,              // 27 '
CHAR_PUNCT,              // 28 (
CHAR_PUNCT,              // 29 )
CHAR_PUNCT,              // 2A *
CHAR_PUNCT,              // 2B +
CHAR_PUNCT,              // 2C ,
CHAR_PUNCT,              // 2D -
CHAR_PUNCT,              // 2E .
CHAR_PUNCT,              // 2F /
// 48..63                
CHAR_DIGIT,              // 30 0
CHAR_DIGIT,              // 31 1
CHAR_DIGIT,              // 32 2
CHAR_DIGIT,              // 33 3
CHAR_DIGIT,              // 34 4
CHAR_DIGIT,              // 35 5
CHAR_DIGIT,              // 36 6
CHAR_DIGIT,              // 37 7
CHAR_DIGIT,              // 38 8
CHAR_DIGIT,              // 39 9
CHAR_PUNCT,              // 3A :
CHAR_PUNCT,              // 3B ;
CHAR_PUNCT,              // 3C <
CHAR_PUNCT,              // 3D =
CHAR_PUNCT,              // 3E >
CHAR_PUNCT,              // 3F ?
// 64..79                
CHAR_PUNCT,              // 40 @
CHAR_UPPER,              // 41 A
CHAR_UPPER,              // 42 B
CHAR_UPPER,              // 43 C
CHAR_UPPER,              // 44 D
CHAR_UPPER,              // 45 E
CHAR_UPPER,              // 46 F
CHAR_UPPER,              // 47 G
CHAR_UPPER,              // 48 H
CHAR_UPPER,              // 49 I
CHAR_UPPER,              // 4A J
CHAR_UPPER,              // 4B K
CHAR_UPPER,              // 4C L
CHAR_UPPER,              // 4D M
CHAR_UPPER,              // 4E N
CHAR_UPPER,              // 4F O
// 80..95                
CHAR_UPPER,              // 50 P
CHAR_UPPER,              // 51 Q
CHAR_UPPER,              // 52 R
CHAR_UPPER,              // 53 S
CHAR_UPPER,              // 54 T
CHAR_UPPER,              // 55 U
CHAR_UPPER,              // 56 V
CHAR_UPPER,              // 57 W
CHAR_UPPER,              // 58 X
CHAR_UPPER,              // 59 Y
CHAR_UPPER,              // 5A Z
CHAR_PUNCT,              // 5B [
CHAR_PUNCT,              // 5C \
CHAR_PUNCT,              // 5D ]
CHAR_PUNCT,              // 5E ^
CHAR_PUNCT,              // 5F _
// 96..111
CHAR_PUNCT,              // 60 `
CHAR_LOWER,              // 61 a
CHAR_LOWER,              // 62 b
CHAR_LOWER,              // 63 c
CHAR_LOWER,              // 64 d
CHAR_LOWER,              // 65 e
CHAR_LOWER,              // 66 f
CHAR_LOWER,              // 67 g
CHAR_LOWER,              // 68 h
CHAR_LOWER,              // 69 i
CHAR_LOWER,              // 6A j
CHAR_LOWER,              // 6B k
CHAR_LOWER,              // 6C l
CHAR_LOWER,              // 6D m
CHAR_LOWER,              // 6E n
CHAR_LOWER,              // 6F o
// 112..127              
CHAR_LOWER,              // 70 p
CHAR_LOWER,              // 71 q
CHAR_LOWER,              // 72 r
CHAR_LOWER,              // 73 s
CHAR_LOWER,              // 74 t
CHAR_LOWER,              // 75 u
CHAR_LOWER,              // 76 v
CHAR_LOWER,              // 77 w
CHAR_LOWER,              // 78 x
CHAR_LOWER,              // 79 y
CHAR_LOWER,              // 7A z
CHAR_PUNCT,              // 7B {
CHAR_PUNCT,              // 7C |
CHAR_PUNCT,              // 7D }
CHAR_PUNCT,              // 7E ~
CHAR_CONTROL             // 7F (DEL)
];
Admin
Site Admin

avatar

(send private message)

Posts: 933
Topics: 55

Location:
OverHertz Studio
[1471] - posted: 2015-01-15 12:30:53
Thanks, updated file.

Download Ziron
Get free hosting for Ziron related fan-sites and Ziron projects, contact me in private message.
0CodErr
Ziron Guru
(send private message)

Posts: 199
Topics: 37

Location:
[1473] - posted: 2015-01-17 13:04:29
It seems that in latest release this file is a bit incorrect. For example it contains CHAR_LETTER_or_CHAR_LOWER, which we no longer use. See post [1469] above. May be just copy from this post smile
Admin
Site Admin

avatar

(send private message)

Posts: 933
Topics: 55

Location:
OverHertz Studio
[1477] - posted: 2015-01-17 14:20:09
Fixed.

Download Ziron
Get free hosting for Ziron related fan-sites and Ziron projects, contact me in private message.
0CodErr
Ziron Guru
(send private message)

Posts: 199
Topics: 37

Location:
[1582] - posted: 2015-02-02 09:23:57
I tried to compile this: Code:
program RAW_IMAGE 'test';

#set bits 32;

#include 'charclass.zir';

byte b1; 

b1 = IsDigit(ord('5'));

If (IsDigit(ord('5')) == true) {};


In latest release error:
Operand 1 is invalid in [IsDigit]
Code:
al = CHAR_TABLE[gp_ax];

But compiled in 20017 version.

What I noticed in result:
Code:
// b1 = IsDigit(ord('5'));
00000000  33C0              xor eax,eax
00000002  B035              mov al,0x35
00000004  8A800E040000      mov al,[eax+0x40e]
0000000A  2401              and al,0x1
0000000C  A30E050000        mov [0x50e],eax

// If (IsDigit(ord('5')) == true) {};
00000011  33C0              xor eax,eax
00000013  B035              mov al,0x35
00000015  8A800E040000      mov al,[eax+0x40e]
0000001B  2401              and al,0x1
0000001D  85C0              test eax,eax
0000001F  7400              jz 0x21



Instead of Code:
0000001B  2401              and al,0x1
0000001D  85C0              test eax,eax

we can do this shorter: Code:
test al,0x1


If i not mistaken SphinxC-- allow for functions to return result in flags.
Something like this:
Code:
function MyFunc(dword p1) : ZF {} // result in zero flag
function MyFunc(dword p1) : CF {} // result in carry flag
function MyFunc(dword p1) : SF {} // result in sign flag


Then our IsDigit will looks as:
Code:
inline function IsDigit($symbol) : ZF {
  gp_ax = 0; al = $symbol;
  al = CHAR_TABLE[gp_ax];
  test al, CHAR_DIGIT; // and al, CHAR_DIGIT;
  $return ZF; // $return gp_ax;
}



Next, if we need just compare then we insert only jz:
Code:
// If (IsDigit(ord('5')) == true) {};
xor eax,eax
mov al,0x35
mov al,[eax+0x40e]
test al,0x1

jz 0x21


And if need assignment then we can use SETcc instructions:
Code:
// b1 = IsDigit(ord('5'));
xor eax,eax
mov al,0x35
mov al,[eax+0x40e]
test al,0x1

setz [0x50e]


Only one problem with SETcc:
Checks the status flags in the rFLAGS register and, if the flags meet the condition specified in the
mnemonic (cc), sets the value in the specified 8-bit memory location or register to 1.
Admin
Site Admin

avatar

(send private message)

Posts: 933
Topics: 55

Location:
OverHertz Studio
[1585] - posted: 2015-02-02 13:16:57
The only problem i foresee with this is that when programmers call a function, they expect eax/ax/al to be the result. So imagine this scenario:

Code:
// some code
dl = 50;
// some more code
IsDigit(ord('5'));

if (dl == 50) { //dl is still 50?
  cl = 12;
  dl = 125;
}

if (al == true) {
   // etc
}


It may be a better alternative to add additional macros such as testIsDigit(...) etc.

I will add comparing flags hopefully for next release.

Edit: For now, I have implemented flag comparing.

Also fixed the invalid operand error.


Code:
inline function IsDigitTest($symbol) {
  gp_ax = 0; al = $symbol;
  al = CHAR_TABLE[gp_ax];
  test al, CHAR_DIGIT;
  $return zero_flag; //or you can use ZF
}

If (IsDigitTest(ord('1')) == true) {
  println('IS digit!');
}


Download Ziron
Get free hosting for Ziron related fan-sites and Ziron projects, contact me in private message.
0CodErr
Ziron Guru
(send private message)

Posts: 199
Topics: 37

Location:
[1588] - posted: 2015-02-02 18:05:11
Oh, good smile

If i understand correctly it is possible to write:
Code:
$return ZF;


Also, i noticed that we can make our code yet shorter smile
But don't know is it supported by Ziron or still not.
Code:
inline function IsDigitTest($symbol) {
  gp_ax = 0; al = $symbol;
  test CHAR_TABLE[gp_ax], CHAR_DIGIT;
  $return ZF;
}


And yet question. Is it need for now to add such new functions as IsDigitTest, IsLetterTest, etc..? Because idea was -- use ONE function for TWO things: compare and assignment.

As for Code:
IsDigit(ord('5'));
// some code
if (al == true) {
   // etc
}


if we will define that our function return ZF, then right variant of code will be
Code:
al = IsDigit(ord('5'));
// some code
if (al == true) {
   // etc
}

I don't know why it is problem. I think that it no problem. Also we have many different things, for example, different calling conventions. And there is no problem if they are different. Just my opinion smile
Admin
Site Admin

avatar

(send private message)

Posts: 933
Topics: 55

Location:
OverHertz Studio
[1589] - posted: 2015-02-02 18:09:42
The problem is that it is not mandatory to write al = (which many people do not)

Code:
IsDigit(ord('5'));
// some code
if (al == true) {
   // etc
}


Still correct ^


And yes

Code:
inline function IsDigitTest($symbol) {
  gp_ax = 0; al = $symbol;
  test CHAR_TABLE[gp_ax], CHAR_DIGIT;
  $return ZF;
}


This is also valid smile

So unless when the macro is passed in a conditional statement, the macro can receive a flag accordingly which lets the macro know it is for comparison.

So for example

Code:
zf = IsDigit(..  // as flag
al = IsDigit(..  // no flag
IsDigit(..       // no flag
if (IsDigit(...  // as flag


What do you think?

Download Ziron
Get free hosting for Ziron related fan-sites and Ziron projects, contact me in private message.
Pages: 1 2 3 4
create new reply


Quick reply:

Message:



Currently Active Users:
There are currently 1 user(s) online. 0 member(s) and 1 guest(s)
Most users ever online was 1046, January 28, 2022, 2:08 pm.


Statistics:
Threads: 225 | Posts: 1848 | Members: 51 | Active Members: 51
Welcome to our newest member, yecate
const Copyright = '2011-2024 © OverHertz Ltd. All rights reserved.';
Web development by OverHertz Ltd