Click here to Skip to main content
15,896,606 members
Home / Discussions / C / C++ / MFC
   

C / C++ / MFC

 
GeneralRe: IID_IHTMLElement Failing Pin
Abhi Lahare16-Apr-07 21:55
Abhi Lahare16-Apr-07 21:55 
GeneralRe: IID_IHTMLElement Failing Pin
Mark Salsbery17-Apr-07 5:15
Mark Salsbery17-Apr-07 5:15 
GeneralRe: IID_IHTMLElement Failing Pin
Abhi Lahare19-Apr-07 2:59
Abhi Lahare19-Apr-07 2:59 
AnswerRe: IID_IHTMLElement Failing Pin
Mark Salsbery16-Apr-07 10:26
Mark Salsbery16-Apr-07 10:26 
AnswerRe: IID_IHTMLElement Failing Pin
Stephen Hewitt16-Apr-07 14:23
Stephen Hewitt16-Apr-07 14:23 
GeneralRe: IID_IHTMLElement Failing [modified] Pin
Abhi Lahare16-Apr-07 19:51
Abhi Lahare16-Apr-07 19:51 
QuestionTBSTYLE_EX_DRAWDDARROWS Pin
bob1697216-Apr-07 5:50
bob1697216-Apr-07 5:50 
QuestionMBCS - Character range generation Pin
John R. Shaw16-Apr-07 5:36
John R. Shaw16-Apr-07 5:36 
Note: If you are not an expert on standard C++ or the STL, you may want to skip this one.

Restrictions:
1) Only standard C++ may be used.
2) Can not use ‘use_facet’ template, because that is not correctly defined by all suppliers (namely VC6.0).
3) Can not use any vendor specific extensions; like ‘_ismbtrail(…)’.
4) No knowledge of the code-page may be used.

I need to generate the missing characters in a range of multi-byte characters. My current implementation restricts sequences of characters to those with the same lead-byte, but I wish to eliminate that and ensure that no generated characters are invalid.

1) Example (Japanese: shiftjis):
Input range: Fullwidth digits “[0-9]” -> “[\x82\x4F-\x82\x58]”
Output sequence: “\x82\x4F\x82\x50\x82\x51\x82\x52\x82\x53\x82\x54\x82\x55\x82\x56\x82\x57\x82\x58”

That sequence is easy to generate because the lead bye is the same for all those characters and there are no gaps (invalid characters) in the sequence. The problem is that this places restrictions on the user – requiring them to know that all characters in range have the same lead-byte and that there are no gaps.

2) Example of lead-byte problem (Japanese: shiftjis):
Input range: “[\x82\xF1-\x83\x40]” -> [“Hiragana Letter N” – “Katakana Letter Small A”]
Output sequence: “\x82\xF1\x83\x40”
Unicode equivalent: “\x3093\x30A1”

In the above example it does not matter if a user is likely to enter that range, the code must be able to create the valid output sequence. As you can see the gap exists in both ShiftJis (MBCS) and Unicode.

3) Example (Japanese: shiftjis):
Input range: “[\x81\xE0-\x81\xDF]” -> [“Approximately Equal To Or The Image Of” – “Identical To”]
Output sequence: “\x81\xE0\x81\xDF”
Unicode equivalent: “\x2252\x2261”

In the above example both lead-bytes are the same, but there exist a gap of 16 invalid characters. This situation is easier to handle because I can actually use ‘mblen’ to check for valid 2-byte sequences, although I do not like this method.

4) Example (Japanese: shiftjis):
Input range: “[\xDF-\x81\x40]” -> [“Halfwidth Katakana Semi-voiced Sound Mark” – “Ideographic Space”]
Output sequence: “\xDF\x81\x40”
Unicode equivalent: “\xFF9F\x3000”

In this case the first character in the range is a single-byte (SB) character and the second is a double-byte character. Again it does not matter if a user is likely to enter that range; the code must be able to create the valid output sequence. Also note that the Unicode sequence is even worst than the ShiftJis sequence.

Note: All character sequences where retrieved from the ‘Character Map’ program and may not reflect the ‘locale’ sequence, which is part of the problem.

As you can see it is not a simple matter of incrementing the character as we would do in ASCII. There is no ‘istrail’ method in the standard and even if there was it may not be enough to solve the problem.

The following is the current implementation, which will only work for examples 1 and 3:
template<class StringType_>
    void basic_regexp<StringType_>::insert_range(
    string_set_type& c_set,         // set to insert range into
    const string_type& s1,          // lower range: single-character string
    const string_type& s2) const    // upper range: single-character string
 {
    // Both character substrings must be the same size (multibyte test)
    if( s1.size() != s2.size() )
        throw_char_range_error();

    // Get possible single-byte lower and upper range characters
    char_type lower = s1[0];
    char_type upper = s2[0];
    char_type leadbyte = 0;

    // Get lead byte if any
    if( s1.size() > 1 )
        leadbyte = lower;

    // If lead byte
    if( leadbyte )
    {
        // Lead bytes must match; since we are trying to avoid
        // spaning non-contiguous sequences.
        if( lower != upper )
            throw_char_range_error();

        // Use trailing bytes as lower and upper range bytes
        lower = s1[1];
        upper = s2[1];
    }

    // Make sure the range is valid
    if( lower > upper )
        throw_char_range_error();

    // Insert range of characters into substring set
    string_type s;
    for( ; lower <= upper; ++lower )
    {
        s.clear();
        if( leadbyte )
        {
            s += leadbyte;
            s += lower;

            // Avoid invalid character sequences
            if( mb_len(s,2) != 2 )
                continue;
        }
        else
        {
            s += lower;
        }
        c_set.insert(s);
    }
}


I know of no algorithm that can solve the problems presented here. The solution must be similar in simplicity as the above implementation. I can envision a method to solve some of these problems, but it would be grossly inefficient.

If there is an example of a possible solution anywhere I would be interested.

Throw any ideas you may have at the problem, but please note that the restrictions are there to ensure portability. In other words it must be as usable as any STL implementation would be.

Thanks to any and all of those who may examine this problem, even if you can think of a solution worth posting.


INTP
"Program testing can be used to show the presence of bugs, but never to show their absence."Edsger Dijkstra

AnswerRe: MBCS - Character range generation Pin
Michael Dunn16-Apr-07 9:57
sitebuilderMichael Dunn16-Apr-07 9:57 
GeneralRe: MBCS - Character range generation Pin
John R. Shaw16-Apr-07 12:28
John R. Shaw16-Apr-07 12:28 
Questionwhere are my attributes Pin
zqueezy16-Apr-07 5:33
zqueezy16-Apr-07 5:33 
AnswerRe: where are my attributes Pin
toxcct16-Apr-07 5:36
toxcct16-Apr-07 5:36 
GeneralRe: where are my attributes Pin
zqueezy16-Apr-07 5:46
zqueezy16-Apr-07 5:46 
GeneralRe: where are my attributes Pin
Hamid_RT16-Apr-07 9:28
Hamid_RT16-Apr-07 9:28 
GeneralRe: where are my attributes Pin
toxcct16-Apr-07 9:55
toxcct16-Apr-07 9:55 
AnswerRe: where are my attributes Pin
Mark Salsbery16-Apr-07 5:56
Mark Salsbery16-Apr-07 5:56 
GeneralRe: where are my attributes Pin
zqueezy16-Apr-07 6:09
zqueezy16-Apr-07 6:09 
QuestionHeart Beat Pin
nahitan16-Apr-07 4:52
nahitan16-Apr-07 4:52 
AnswerRe: Heart Beat Pin
pbraun16-Apr-07 5:46
pbraun16-Apr-07 5:46 
AnswerRe: Heart Beat Pin
Moak17-Apr-07 1:09
Moak17-Apr-07 1:09 
AnswerRe: Heart Beat Pin
cmk17-Apr-07 10:04
cmk17-Apr-07 10:04 
QuestionHerat Beat Pin
nahitan16-Apr-07 4:49
nahitan16-Apr-07 4:49 
GeneralRe: Herat Beat Pin
Moak13-Mar-08 14:51
Moak13-Mar-08 14:51 
QuestionWhat compilers do you use? Pin
Code232616-Apr-07 4:48
Code232616-Apr-07 4:48 
AnswerRe: What compilers do you use? Pin
toxcct16-Apr-07 5:19
toxcct16-Apr-07 5:19 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.