Van's House

  • Protected or Private?

    As the designer of base class, you may hesitate whether to use private or protect access control. Then, let's try the following examples:

    1. Call protected member function

    #include <cstdio>

    class A
    {
    protected:
        void b() {printf("Oops!\n");}
    };

    void f(A* a)
    {
        class A_hack:public A
        {
            friend void f(A*);
        };
        static_cast<A_hack *>(a)->b();
    }

    class B
    {
    public:
        void f(A* a)
        {
            class A_hack:public A
            {
                friend B;
            };
            static_cast<A_hack *>(a)->b();
        }
    };

    int main()
    {
        f(NULL);
        B().f(NULL);
    }

    Although the result of the cast is undefined as stated in the standard, if no this pointer adjustment happens and the layout of A is the same as A_hack which are normally the case, the code will break the access control.

    For the evil user, his purpose is archieved.

    2. Call pure virtual function

    class A
    {
    protected:
        virtual void Fun() =0;
    };

    class B:public A
    {
    public:
        B() {Dummy();}

    private:
        void Dummy() {Fun();}
    };
    class C:public B
    {
    public:
        virtual void Fun() {}
    };

    int main()
    {
        C c;
    }

    Do you ever wonder how to call pure virtual function? Do you like to see what does _purecall in VC do? Just try the above code.

    The problem here is because of the protect access of "Fun" in base class. Of course, you should not call virtual function in ctor (they will not have polymorphic behavior), but when you call Dummy in ctor, you may not realize the fact that Dummy will call virtual function internally. Then the disaster happens.

    If what you want by using virtual is to allow the derived class to provide some specific behavior, you'd better declare the function as private.

    Although both the above code are nonconformant, they at least show the possibility that your user can do more than what you expect for protect.

    If you don't even have confidence with private (Like evil #define private public which definitely violates One Definition Rule), you'd better use the PImpl Idiom to implement your interface.

  • IDA Pro 5.3 Demo is released

    IDA Pro is the world-class disassembler. It's a very useful reverse engineering tool.

    Now the demo of the newest version 5.3 is available: IDA Pro 5.3 demo download

    You can also try the freeware version (a little out of date): IDA Pro 4.9 Freeware

  • Minimize the size of your program - low level

    The previous post: Minimize the size of your program - high level

    Do you know the smallest x86 PE file in the world?
    Here is the answer: http://www.phreedom.org/solar/code/tinype/

    • Smallest possible PE file: 97 bytes
    • Smallest possible PE file on Windows 2000: 133 bytes
    • Smallest PE file that downloads a file over WebDAV and executes it: 133 bytes

    BTW: You can use IDA Pro, WinDBG or OllyDbg to open and debug the tiny PE :P

    The main idea of tiny PE is simple: Striping unnecessary bytes and collapsing data structure by overlapping. So the overhead of PE format will be minimized. Here, we are more concerned about how to decrease the size of our code.

    To investigate the code generated by the compiler, you can use the /FAsc flag.

    First, we can apply the technique used in tiny PE to collapse the PE Header. There're lots of unused fields left, so we can move our data into these fields to save space. Another benefit is that, the address difference of these data will be within 0x80, which is suitable for our further optimization of the code.

    Now let's begin our optimization!

    1. Algorithm

    a. Local variable initialization

    Normally, the initialization involves lots of bytes of code.

        unsigned long keyT[4]={0xCBDCEDFE,0x8798A9BA,0x43546576,0x00102132};

    The above code will be translated to assembly like:

        mov    DWORD PTR [ebp+offset8], imm32

    It is 7 bytes. Then 7*4 = 28 bytes to initialize the whole array.

    Instead, we can change the initialization into for-loop.

        for (size_t i=0;i<16;++i) ((char *)keyT)[i]=0xFE-i*0x11;
        ((char *)keyT)[15]++;

    The above code can be implemented as:

        lea    edx, [ebp+keyT-1]
        xor    ecx, ecx
        mov    cl, 16
        mov    al, 0FEh
    loc_4:
        inc    edx
        mov    [edx], al
        sub    al, 11h
        loop    loc_4
        inc    byte [edx]

    28 -> 18 bytes!

    b. memcpy

    By default, the compiler will call the API memcpy in msvcrt.dll. But the cost is high. The alternative is to use intrinsic version of memcpy which takes advantage of "rep movs" instruction instead.
    If we take a closer look at the algorithm, it turns out that the code simply rotates keyT array t bytes left.

        memcpy(Buf1,(char *)keyT+t,0x10-t);
        memcpy((char *)Buf1+0x10-t,keyT,t);
        memcpy(keyT,Buf1,16);

    That means most of the parameter for memcpy can be reused, which will save lots of store/load.

        ;ebx == t, edi == end of keyT
        xor    ecx, ecx
        mov    cl, 10h
        sub    cl, bl
        mov    esi, edi
        sub    esi, ecx
        lea    edi, [ebp+Buf1]
        push    edi
        rep    movsb

        mov    cl, bl
        lea    esi, [ebp+keyT]
        rep    movsb

        pop    esi
        mov    cl, 10h
        rep    movsb

    Thanks to the value reuse, the last two memcpy only use 12 bytes, and 35 bytes in total! In comparison, 54 bytes before optimization, not including more than 10 bytes for PE format overhead related to reference memcpy API

    2. 8-bit vs 32-bit

    In 32-bit environment, all the data and address are 32-bit. Then the 32-bit immediate numbers and long jump instructions can contribute to large portion of our code.
    We can try to make these value small enough, so they can be held in 8-bit data type.
    For example, when we call:

        printf("%08X",keyT[k]);

    The compiler will generate the code like this:

        push    hexstr+ImageBase    ; "%08X"
        call    [printf+ImageBase]

    Because we arrange the positions of hexstr and printf within 80 bytes, we can use the following code instead:

        mov    eax, hexstr+ImageBase
        push    eax
        add    al, printf-hexstr
        call    [eax]

    Because printf uses cdecl calling convention, the caller is responsible to balance the stack. When we pop up the arguments from stack, we can reuse the pushed "eax" for "printf("\n");"

        add    al, newline-hexstr
        push    eax
        add    al, printf-newline
        call    [eax]

    22 -> 17 bytes!

    We can also split the long jmp into two short jmps (5 -> 4 bytes)

    3. Misc

    • Prevent using 16-bit instruction.
    • Use "loop" instruction to do the loop.
    • Use enter/leave the setup stack.
    • Use "lods/stos" to load and store from the memory.
    • Use "pusha/popa" for multiple push/pop.
    • Sometimes, the instruction for "eax" will be a little shorter than for other registers.
    • Sometimes, 8-bit instruction will be a little shorter than the 32-bit version.
    • Return value is not needed here, and OS doesn't require us to save the registers. So we can omit both the "xor eax, eax" at the end of the program and the "push/pop" instructions to save the register.
    • Register allocation is also very important. If chosen properly, we can share their values and prevent many load/store.
    • Take advantage of al, ah and alike. Otherwise the limited number of registers on X86 will be a big issue.

    For more detailed information on how to optimize your code for size, please see the excellent optimization manual by Agner Fog.

    BTW: Don't forget to set the "Debug Directory Size" in the PE Header to 0, otherwise the program will crash on WinXP. Because that data member overlaps with our code, that means we have to insert instructions like this into our code:

        jmp    short skip_debug
        times 4 db 0 ;ensure debug directory size is 0
    skip_debug:

    Here is the source (you can compile it using "nasmw -f bin") and the result:

      PE Header Code Data Total size
    Before optimization 480 272 144 896
    After optimization 136 180 7 323

  • Minimize the size of your program - assembly source

    ; based on tiny.asm, modified by xiangfan

    ;e_cblp 2b
    ;tbl 1b
    ;hexstr 1b
    ;LoaderFlags 4b

    BITS 32

    ;
    ; MZ header
    ;
    ; The only two fields that matter are e_magic and e_lfanew

    mzhdr:
        dw "MZ"       ; e_magic
        dw 0          ; e_cblp UNUSED

    ;
    ; PE signature
    ;

    pesig:
        dd "PE"       ; e_cp UNUSED          ; PE signature
                      ; e_crlc UNUSED

    ;
    ; PE header
    ;

    pehdr:
        dw 0x014C     ; e_cparhdr UNUSED     ; Machine (Intel 386)
        dw 1          ; e_minalloc UNUSED    ; NumberOfSections

        ;dd 0xC3582A6A ; e_maxalloc UNUSED    ; TimeDateStamp UNUSED <0xC>

    start:
        jmp short main

    tbl:
        db 1,1,2,3,7,5,6,4,8
        db 0 ;align 4

        ;dd 0          ; e_sp UNUSED          ; PointerToSymbolTable UNUSED
                      ; e_csum UNUSED

        ;dd 0          ; e_ip UNUSED          ; NumberOfSymbols UNUSED
                      ; e_cs UNUSED

        ;==4
        dw sections-opthdr ; e_lsarlc UNUSED ; SizeOfOptionalHeader
        dw 0x10F      ; e_ovno UNUSED        ; Characteristics

    ;
    ; PE optional header
    ;
    ; The debug directory size at offset 0x94 from here must be 0

    filealign equ 4
    sectalign equ 4   ; must be 4 because of e_lfanew

    %define round(n, r) (((n+(r-1))/r)*r)

    opthdr:
    printf_:
        dw 0x10B      ; e_res UNUSED         ; Magic (PE32)
        ;db 8                                 ; MajorLinkerVersion UNUSED
        ;db 0                                 ; MinorLinkerVersion UNUSED

        ;dw 0x29E
        db "pr"

    ;
    ; PE code section
    ;

    sections:
        ;dd 0                        ; SizeOfCode UNUSED                  ; Name UNUSED                
        ;dd 0          ; e_oemid UNUSED       ; SizeOfInitializedData UNUSED                                    
                      ; e_oeminfo UNUSED

        db "intf",0

        db 0,0,0 ;align

        dd codesize   ; e_res2 UNUSED        ; SizeOfUninitializedData UNUSED     ; VirtualSize
        ;==C
        dd start                             ; AddressOfEntryPoint                ; VirtualAddress
        dd codesize                          ; BaseOfCode UNUSED                  ; SizeOfRawData
        ;==C
        dd start                             ; BaseOfData UNUSED                  ; PointerToRawData

    ;
    ; Import table (array of IMAGE_IMPORT_DESCRIPTOR structures)
    ;

    ImageBase equ 0x400000

    idata:
        dd ImageBase                          ; ImageBase                          ; PointerToRelocations UNUSED ; OriginalFirstThunk UNUSED
        ;==4
        dd sectalign  ; e_lfanew             ; SectionAlignment                   ; PointerToLinenumbers UNUSED ; TimeDateStamp UNUSED
        ;==4
        dd filealign                         ; FileAlignment                      ; NumberOfRelocations UNUSED  ; ForwarderChain UNUSED
                                                                                  ; NumberOfLinenumbers UNUSED
        dd msvcrt                            ; MajorOperatingSystemVersion UNUSED ; Characteristics UNUSED      ; Name
                                             ; MinorOperatingSystemVersion UNUSED                               ; FirstThunk
        dd iat                               ; MajoirImageVersion UNUSED
                                             ; MinorImageVersion UNUSED
        dw 4                                 ; MajorSubsystemVersion                                            ; OriginalFirstThunk UNUSED
        ;dw 0                                 ; MinorSubsystemVersion UNUSED
        ;dd 0                                 ; Win32VersionValue UNUSED                                         ; TimeDateStamp UNUSED
    hexstr:
        db "%08X",0
        db 0 ;align

        dd round(hdrsize, sectalign)+round(codesize,sectalign) ; SizeOfImage                                    ; ForwarderChain UNUSED
        ;==8C
        dd round(hdrsize, filealign)         ; SizeOfHeaders                                                    ; Name UNUSED
        dd 0                                 ; CheckSum UNUSED                                                  ; FirstThunk

    idatasize equ $ - idata

        dw 3                                 ; Subsystem (Win32 Console)
        ;dw 0                                 ; DllCharacteristics UNUSED
    sstr:
        db "%s"
        dd 0                                 ; SizeOfStackReserve
        dd 0                                 ; SizeOfStackCommit
        dd 0                                 ; SizeOfHeapReserve
        dd 0                                 ; SizeOfHeapCommit
        ;dd 0                                 ; LoaderFlags UNUSED
    dstr:
        db "%d",0
        db 0
        dd 2                                 ; NumberOfRvaAndSizes

    ;
    ; The DLL name should be at most 16 bytes, including the null terminator
    ;

    ;
    ; Data directories
    ;
    ; The debug directory size at offset 0x34 from here must be 0

    scanf_:
    newline:
        db 10,0
    ;dw 0x2AB
    db "scanf",0

        ;dd 0                                 ; Export Table UNUSED
        ;dd 0

        ;==38
        dd idata - $$                        ; Import Table

    hdrsize equ $ - $$

    main:
    var_42C    equ -42Ch
    var_28        equ -28h
    var_18        equ -18h
    ;var_14        equ -14h
    ;var_10        equ -10h
    ;var_C        equ -0Ch
    ;var_8        equ -8
    ;var_4        equ -4
            enter    42Ch, 0
            push    esp    ;[ebp+42C]
            ;push    dstr+ImageBase    ; "%d"
            ;call    [scanf+ImageBase]
            mov    eax, dstr+ImageBase
            push    eax
            add    al, scanf-dstr
            call    [eax]
            pop    eax    ;dstr
            pop    ecx
            ;pushad    ;ebx,esi,edi
            mov    edi, [ecx]
            test    edi, edi
            jle    short loc_4002D0
            ;times 4 db 0
            add    al, newline-dstr
    loc_400235:                ; CODE XREF: start+100j
            ;lea    esi, [esp]    ;pushad==20h
            push    esp
            add    al, sstr-newline
            push    eax
            nop
            jmp    short skip_debug
    iat:
    printf:
            dd printf_
    scanf:
            dd scanf_
            ;dd 0
            times 4 db 0 ;ensure debug directory size is 0
    skip_debug:
            add    al, scanf-sstr
            call    [eax]
            pop    esi    ; "%s"
            pop    esi

            ;push    sstr+ImageBase    ; "%s"
            ;call    [scanf+ImageBase]
            ;pop    ecx
            ;pop    ecx
            ;mov    dword [ebp+var_18], 0CBDCEDFEh
            ;mov    dword [ebp+var_14], 8798A9BAh
            ;mov    dword [ebp+var_10], 43546576h
            ;mov    dword [ebp+var_C], 102132h

            lea    edx, [ebp+var_18-1]
            ;xor    ecx, ecx
            xchg    eax, ecx
            mov    cl, 16
            mov    al, 0FEh
    loc_4:
            inc    edx
            mov    [edx], al
            sub    al, 11h
            loop    loc_4
            ;mov    byte [edx], 0
            inc    byte [edx]

    loc_400272:                ; CODE XREF: start+D0j
            ;movsx    ebx, byte [esi]
            ;inc    esi
            lodsb
            movsx    ebx, al
            test    ebx, ebx
            jz    short loc_4002DA
            pusha    ;edi,esi
            and    al, 0Fh
            xchg    eax, ebx
            ;mov    eax, ebx
            cdq
            ;xor    ecx, ecx    ;ecx is alway =0 here
            mov    cl, 9
            idiv    ecx
            mov    cl, 4
            test    edx, edx
            lea    edi, [ebp+var_18]
            jl    short loc_40028F
            mov    dl, byte [edx+tbl+ImageBase]
    loc_40028F:                ; CODE XREF: start+9Aj
            mov    eax, edx
            imul    eax, dword [edi]
            add    eax, 0FBFCh
            stosd
            loop    loc_40028F

            ;xor    ecx, ecx
            ;and    bl, 0Fh
            mov    cl, 10h
            sub    cl, bl
            mov    esi, edi
            sub    esi, ecx
            lea    edi, [ebp+var_28]
            push    edi
            rep    movsb
            mov    cl, bl
            lea    esi, [ebp+var_18]
            rep    movsb
            ;lea    esi, [ebp+var_28]
            pop    esi
            mov    cl, 10h
            rep    movsb
            popa
            jmp    short loc_400272
    loc_4002D0:
            jmp    short loc_400311
    loc_4002D2:
            jmp    short loc_400235
    ; ---------------------------------------------------------------------------

    loc_4002DA:                ; CODE XREF: start+6Cj
            lea    esi, [ebp+var_18]
            mov    bl, 4
    loc_4002E0:                ; CODE XREF: start+EBj
            lodsd
            push    eax
            ;push    hexstr+ImageBase    ; "%08X"
            ;call    [printf+ImageBase]
            mov    eax, hexstr+ImageBase
            push    eax
            add    al, printf-hexstr
            call    [eax]
            pop    eax
            dec    ebx    ;ebx==0 after 2DA
            pop    ecx
            jnz    short loc_4002E0

            ;push    newline+ImageBase ; "\n"
            ;call    [printf+ImageBase]
            add    al, newline-hexstr
            push    eax
            add    al, printf-newline
            call    [eax]
            dec    edi
            pop    eax
            jnz    loc_4002D2

    loc_400311:                ; CODE XREF: start+1Ej
            ;popad
            ;xor    eax, eax
            leave
            retn

    msvcrt:
        db "msvcrt", 0

    codesize equ $ - start

    filesize equ $ - $$

  • Minimize the size of your program - high level

    Nowadays, the size of the program is no longer a big issue. But it is still very interesting to write program with extremely small size.

    Here, I'll limit the discussion on X86 + PE format to simplify the analysis. And I choose VC9 as our C++ compiler. The target program reads the number of queries first, then calculates the digest for each query and outputs the result.

    If you use the default setting to compile the code, you will get an executable of size 56320 bytes.

    cl /c test.cpp
    link test.obj

    That is small, right? But most of the code in the executable is not yours. By default, the compiler will statically link to the CRT library, so they will reside in your program. We'd better get rid of these stuff. Let's add /MD flag to cl.exe. WOW! test.exe is only 5632 bytes now! But that is not the end of the story, we can decrease the size even further.

    If you check the file generated, you can still find lots of stuff not belong to you. For example, the entry point procedure "start" is added "magically" by the compiler to initialize the CRT (you can check the source code of this function in Microsoft Visual Studio 9.0\VC\crt\src\crt0.c). In our program, these work can be safely ignored. We can pass /entry:"main" flag to tell the linker to use our main function as the entry point directly. Unfortunately, you will get the following linker errors if you do that:

    MSVCRT.lib(gs_report.obj) : error LNK2019: unresolved external symbol __imp__TerminateProcess@8 referenced in function ___report_gsfailure
    MSVCRT.lib(gs_report.obj) : error LNK2019: unresolved external symbol __imp__GetCurrentProcess@0 referenced in function ___report_gsfailure
    MSVCRT.lib(gs_report.obj) : error LNK2019: unresolved external symbol __imp__UnhandledExceptionFilter@4 referenced in function ___report_gsfailure
    MSVCRT.lib(gs_report.obj) : error LNK2019: unresolved external symbol __imp__SetUnhandledExceptionFilter@4 referenced in function ___report_gsfailure
    MSVCRT.lib(gs_report.obj) : error LNK2019: unresolved external symbol __imp__IsDebuggerPresent@0 referenced in function ___report_gsfailure

    Don't worry. That is what /GS flag does, and we will discuss it later. Here we simply pass kernel32.lib to link.exe to resolve the failure. The size of test.exe is now 3072 bytes.

    /GS is used to detect buffer overrun, and is on by default. This flag is very useful, but not suitable when size is an issue. So we turn it off by passing /GS- to cl.exe and test.exe decrease to 2560 bytes!

    There are no redundant code now, and we can focus on other stuff in our program. In PE format, different sections are used to store different kinds of data. For example, VC store code into ".text" and data into ".data". Each section will align to the 512-byte boundary in the file. That is why all the previous size of test.exe are multiply of 512. We can merge them to save the space. These flags for link.exe are what we need: /MERGE:.rdata=.text /MERGE:.data=.text. We will then have test.exe of 1536 bytes!

    With /O1 for cl.exe (minimize size), we can decrease the size to 1024 bytes. After loosing the section alignment requirement to 4 by linker flag /ALIGN:4, we achieve the size of 896.

    All of these tunings are high level, and finally we decrease the size of the original exe by more than 98%!

    cl /O1 /MD /GS- /c test.cpp
    link test.obj /entry:"main" /MERGE:.rdata=.text /MERGE:.data=.text /ALIGN:4

    BTW: If the main is completely empty, the above flag setting can generate a program of only 468 bytes. That is the lower bound of high level size optimization. To go even further, we have to resort to low level approach. That will be described in the next post.

    The program:

    #include <cstdio>
    #include <cstring>

    static unsigned char Tbl[9]={1,1,2,3,7,5,6,4,8};

    int main()
    {
        unsigned char BufT[0x401];
        int n;
        scanf("%d",&n);
        for (;n>0;--n) {
            scanf("%s",BufT);

            unsigned long keyT[4]={0xCBDCEDFE,0x8798A9BA,0x43546576,0x00102132};
            unsigned char *Buf=BufT;
            int tt;
            while (tt=*Buf++) {
                {
                    int t=Tbl[tt%9];
                    for (int k=0;k<4;++k) keyT[k]=keyT[k]*t+0xFBFC;
                }
                unsigned long Buf1[4];
                int t=tt&0xF;
                memcpy(Buf1,(char *)keyT+t,0x10-t);
                memcpy((char *)Buf1+0x10-t,keyT,t);
                memcpy(keyT,Buf1,16);
            }
           
            for (size_t k=0;k<4;++k) printf("%08X",keyT[k]);
            printf("\n");
        }
        return 0;
    }

  • Obfuscate your code

    Obfuscation is widely used to protect your code from reverse engineering.
    Here is one example which takes advantage of indirected call and opcode overlap in X86:

    __declspec(naked)
    void Fun1()
    {
         __asm {
             //obfuscation chunk
             call LABEL1
    LABEL1:
             pop eax
             add eax, 6
             jmp eax
     
             //real function body
             ret
         }
    }
     
    __declspec(naked)
    void Fun2()
    {
         __asm {
             //obfuscation chunk
             call LABEL1
    LABEL1:
             pop eax
             add eax, 7
    /*
             jmp -1
             jmp eax (merged with the previous instruction) 0xEB 0xFF 0xFF 0xE0 -> 0xEB 0xFF 0xE0
    */
             __emit 0xEB
             __emit 0xFF
             __emit 0xE0

             //real function body
             ret
         }
    }

    int main()
    {
    #if 0
         Fun1();
    #else
         Fun2();
    #endif
    }

    In Fun1, the obfuscation chunk do the following work:

    • Get the current EIP (LABEL1)
    • Adjust the EIP to the target function
    • Jump to the target

    In Fun2, it merges the two jump instructions which will confuse the disassembler. To make the analysis even harder, we can rely on the fact that “add eax, 7” (assume the real function body is after our obfuscation chunk) will never overflow, and replace “jmp –1” with “jno -1”. Or we can use more complicated arithmetic to compute the target address.

    Other kinds of obfuscations include:

    • API Call Obfuscation (LoadLibrary+Encrypted Parameter)
    • Data Encryption (String Literal)
    • Code CheckSum + AntiDebugger
    • Dynamic Data Extraction
  • Play with the C++ compiler - compile nightmare

     

    Playing with the compiler is interesting. One challenge for the compiler writer is compilation performance. There're many kinds of C++ code which are the nightmare for the compiler. Now let's go!

    1. Preprocess

    a. Self Inclusion (GCC only)
    Normally, the compiler will limit the nest level of inclusion. In order to challenge the compiler, you have to stop the inclusion at appropriate stage. GCC provides a useful macro __INCLUDE_LEVEL__ to do the work.
    (We can write similar code for other compiler, although less elegant)

    #if __INCLUDE_LEVEL__<199
    #include __FILE__
    #include __FILE__
    #endif

    b. Macro Expansion Explosion
    By using macro, the code after the preprocess may reach O(2n). For example:

    #define F1(x) x,x
    #define F2(x) F1(x),F1(x)
    #define F3(x) F2(x),F2(x)
    #define F4(x) F3(x),F3(x)
    #define F5(x) F4(x),F4(x)
    #define F6(x) F5(x),F5(x)
    #define F7(x) F6(x),F6(x)
    #define F8(x) F7(x),F7(x)
    #define F9(x) F8(x),F8(x)
    #define G1(x) F9(x),F9(x)
    #define G2(x) G1(x),G1(x)
    #define G3(x) G2(x),G2(x)
    #define G4(x) G3(x),G3(x)
    #define G5(x) G4(x),G4(x)
    #define G6(x) G5(x),G5(x)
    #define G7(x) G6(x),G6(x)
    #define G8(x) G7(x),G7(x)
    #define G9(x) G8(x),G8(x)

    int main()
    {
        G9(1);
    }

    Of course, different compilers handle this kind of overflow differently. GCC will fail directly (when all of the resource are eaten up), and VC will give ICE (Internal Compiler Error).

    2. Template

    a. Nesting
    Similarly, there're nest level limit for template, but we can easily deceive the compiler.
    Some version of GCC will enter infinite loop when compiles the following code:

    #include <cstddef>
    template <class T>
    struct Test {
        static const size_t Value=Test<Test<T> >::Value;
    };

    VC is wise enough to stop when the nest level limit is exceeded in the above code. But we can use one of VC's bug (you can also treat it as an extension) to write template which requires O(na) compilation time:

    #include <cstddef> 

    #define INNER(A3,N3,A2,N2) \
    template<size_t N3>\
    struct A3\
    {\
        enum {N=A3<N3-1>::N+1};\
    };\
    template<>\
    struct A3<0>\
    {\
        enum {N=A2<N2-1>::N};\
    };

    #define OUTER(A2,N2,A1,N1,A3,CONTENT) \
    template<size_t N2>\
    struct A2\
    {\
        CONTENT\
        \
        enum {N=A3<N2>::N};\
    };\
    template<>\
    struct A2<0>\
    {\
        enum {N=A1<N1-1>::N};\
    };

    #define LEVEL2(a,b,c) INNER(A##b,N##b,A##a,N##a)
    #define LEVEL3(a,b,c) OUTER(A##b,N##b,A##a,N##a,A##c,LEVEL2(a##1,b##1,c##1))
    #define LEVEL4(a,b,c) OUTER(A##b,N##b,A##a,N##a,A##c,LEVEL3(a##1,b##1,c##1))
    #define LEVEL5(a,b,c) OUTER(A##b,N##b,A##a,N##a,A##c,LEVEL4(a##1,b##1,c##1))

    template<size_t N1>
    struct A1
    {
        LEVEL5(1,11,111)
        enum {N=A11<N1>::N};
    };
    template<>
    struct A1<0>
    {
        enum {N=0};
    };

    int main()
    {
        return A1<20>::N;
    }

    The interesting thing is that, the standard doesn't allow this kind of specialization in template class. Is this a bug or extension of VC?
    A member or a member template may be nested within many enclosing class templates. In an explicit specialization for such a member, the member declaration shall be preceded by a template<> for each enclosing class template that is explicitly specialized.

    Based on similar idea, we can write portable version for both GCC and VC.

    #define INNER(A3,N3) \
    template<class N3>\
    struct A3\
    {\
        static void f() {}\
    };

    #define OUTER(A2,N2,A1,CONTENT) \
    template<class N2>\
    struct A2\
    {\
        CONTENT\
        \
        static void f() {\
            A1<char>::f();\
            A1<short>::f();\
            A1<int>::f();\
            A1<long>::f();\
            A1<unsigned char>::f();\
            A1<unsigned short>::f();\
            A1<unsigned int>::f();\
            A1<unsigned long>::f();\
        }\
    };

    #define LEVEL2(a,b) INNER(A##a,T##a)
    #define LEVEL3(a,b) OUTER(A##a,T##a,A##b,LEVEL2(a##1,b##1))
    #define LEVEL4(a,b) OUTER(A##a,T##a,A##b,LEVEL3(a##1,b##1))
    #define LEVEL5(a,b) OUTER(A##a,T##a,A##b,LEVEL4(a##1,b##1))
    #define LEVEL6(a,b) OUTER(A##a,T##a,A##b,LEVEL5(a##1,b##1))
    #define LEVEL7(a,b) OUTER(A##a,T##a,A##b,LEVEL6(a##1,b##1))
    #define LEVEL8(a,b) OUTER(A##a,T##a,A##b,LEVEL7(a##1,b##1))

    LEVEL8(1,11)

    int main()
    {
        A1<int>::f();
    }

    b. Multiple inheritance
    C++ allows multiple inheritance, it can be used to hang the compiler. Try the following "Fibonacci" inheritance:

    template<size_t N>
    class Evil:virtual public Evil<N-1>,virtual public Evil<N-2>
    {
    public:
        virtual ~Evil() {}
    };

    template<>
    class Evil<1>
    {
    };
    template<>
    class Evil<0>
    {
    };

    int main()
    {
        Evil<100> evil;
    }

    c. OLE (Output limit exceeded)
    Normally, the compile time for template is O(n), but in contrast, the amount of error message will be O(n2). Taking advantage of this, you can make the error message "Output Limit Exceeded". For example:

    #define ClassName A

    template <int N>
    class ClassName
    {
        enum {Value=ClassName<N-1>::Value};
    };

    int main()
    {
        int n=ClassName<0>::Value;
    }

    If I change "ClassName" to a very long name (which is widely supported by modern compilers), the error message will be easily OLE.
    PS: The above code will cause ICE in VC8. This is the bug of VC8, and VC9 is happy with the code.

  • Magic behind ValueType.Equals

    In "Effective C#", Bill Wagner says "Always create an override of ValueType.Equals() whenever you create a value type". His main consideration is the performance, because reflection is needed to compare two value types memberwisely.

    In fact, the framework provides optimization for "simple" value type. Let's find out the magic.

    Here is the code disassembled by Reflector:

    bool Equals(object obj)
    {
        //Compare type 

        object a = this;
        if (CanCompareBits(this))
        {
            return FastEqualsCheck(a, obj);
        } 

        //Compare using reflection
    }

    The magic is the two functions "CanCompareBits" and "FastEqualsCheck". They both have attribute "[MethodImpl(MethodImplOptions.InternalCall)]", which indicates that their implementations are in native dlls.

    Walkthrough the source of rotor, you can find that these functions are in "clr\src\vm\comutilnative.cpp"

    The comment of CanCompareBits says "Return true if the valuetype does not contain pointer and is tightly packed". And FastEqualsCheck use "memcmp" to speed up the comparison.

    Then you may think you can safely rely on this optimization and stick to the default ValueType.Equals implementation. But wait a minute. Do you find anything wrong in CanCompareBits?

    The problem is that the condition in the comment doesn't ensure the bitwise comparison to work.

    Imagine you have a structure which only contains a float. What will occur if one contains +0.0, and the other contains -0.0? They should be the same, but the underlying binary representation are different.
    If you nest other structure which override the Equals method, that optimization will also fail.

    This should be a bug. But it also provide another reason for why you should always implement your own instance Equals for ValueType.

  • Debug vs Release

    Someone may wonder what the difference is between Debug and Release mode, and whether it is possible to mix them.

    Here is one example: http://forums.msdn.microsoft.com/en-US/vcgeneral/thread/775ce067-b225-4141-8b86-2d7e9b61db97/

    syperk said: "As a result, I've switched to compiling my debug builds using the /MD switch. If you do this, you also have to undefine the _DEBUG preprocessor macro. Otherwise you get lots of errors about _CrtDebugReportW() not found. This function is apparently inserted into your code if _DEBUG is defined, but is only defined in the debug CRT. As long as optimization is off it seems to be perfectly possible to debug programs compiled this way, and there are no concerns about CRT incompatibility.".

    But that is not true. There are many problems. In my opion, the most important one is that it is not a good idea to link different modules compiled with different compiler options in C++, especially when conditional compilation is involved.

    In fact, "Debug" & "Release" are only two sets of predefined compiler flags and macros definitions provided by the IDE (_DEBUG and NDEBUG are two representing macros). The compiler doesn't aware of that (but it has some magic behind compile option "MD" and "MDd").
    The main problem is that program compiled using "Debug" setting and the release runtime don't share the same compiler flags, and they are highly possible to be incompatible.

    For example, if you change "MDd" -> "MD" in your debug configuration, and compile the following code (don't remove the _DEBUG macro, otherwise the "Debug" version is no longer a debug version (/MDd will define this macro implicitly, and IDE will explicitly define it via compiler flag)) :

    #include <iostream>
    #include <string>
    using namespace std;
    int main()
    {
        std::string str;
        return 0;
    }

    You'll find that the program function properly. If you check the generated exe, you'll find that it refer to two dlls, one is msvcp90d.dll, the other is msvcr90.dll. Oops! You're mixing the debug and release runtime libraries! And that hide the potential incompatibility!

    The magic is here (in use_ansi.h):

    #ifdef _DEBUG
    #pragma comment(lib,"msvcprtd")
    #else /* _DEBUG */
    #pragma comment(lib,"msvcprt")
    #endif /* _DEBUG */

    If you disable this trick, and link to msvcp90.dll, you'll get unresolved symbol error, that is because the implementation of string class is different in "Debug" and "Release" mode. This is one example of the incompatibility.

    error LNK2019: unresolved external symbol "__declspec(dllimport) public: __thiscall std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >(struct std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >::_Has_debug_it)" (__imp_??0?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@QAE@U_Has_debug_it@01@@Z) referenced in function _main

    It's fortunate that this is a linker error. Otherwise, you'll waste lots of time in debugging to find out the subtle incompatibility.

    STL in VC9 devotes lots of efforts to prevent break like this, then you can disable macros like "_HAS_ITERATOR_DEBUGGING" as you like, and keep your code compatible with the runtime (which is compiled with the macro enabled). But it only applies when you stick to debug or stick to release. It is really tricky to guarantee that, and I think you should never rely on this.

  • Conformance macros in VC STL

    In VC STL, there are two macros called "_HAS_IMMUTABLE_SETS" and "_HAS_STRICT_CONFORMANCE" which are defined in yvals.h. They are related to some defects in the C++ standard 2003.

    1. _HAS_IMMUTABLE_SETS will influence the constness of set::iterator.

    Associative containers require its elements to be sorted, so modify them will be dangerous. But the standard doesn't prohibit the programmer to do that.

    In C++0x, the following statement is added for associative containter:
    For associative containers where the value type is the same as the key type, both iterator and const_iterator are constant iterators
    (In the standard, it is said:
    Constant iterators do not satisfy the requirements for output iterators, and the result of the expression *i (for constant iterator i) cannot be used in an expression where an lvalue is required.)
    That means you cannot modify the value of the elements using iterator any more.

    Discussion: http://www.cpptalk.net/image-vp135370.html

    2. _HAS_STRICT_CONFORMANCE will influence two places:

    a. For set, multiset, map and multimap

    void erase(const_iterator position);
    size_type erase(const key_type& x);
    void erase(const_iterator first, const_iterator last);

    ->

    iterator erase(const_iterator position);
    size_type erase(const key_type& x);
    iterator erase(const_iterator first, const_iterator last);

    In C++03, erase in associative containers will not return the iterator, which is incompatible with sequence containers. It is fixed in C++0x
    VC's extension provides the return value for erase. _HAS_STRICT_CONFORMANCE will disable this extension.

    Discussion: http://www.tech-archive.net/Archive/VC/microsoft.public.vc.stl/2005-10/msg00030.html
    Further Discussion: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#103

    b. For codecvt::do_length(stateT& state, const externT* from, const externT* from_end, size_t max) const;

    In C++03, codecvt<wchar_t, char, mbstate_t> and codecvt<char, char, mbstate_t> should return the lesser of max and (from_end-from).
    This is weakened in C++0x. Only codecvt<char, char, mbstate_t> is required to behave like that.

    BTW: VC's STL implementation is provided by Dinkumware. You can see "P.J. Plauger" at the end of each STL file.

  • Online C++ Compilation Service

    There are many useful online c++ compilation service providers. You can use them to do conformance test and portability test without purchasing and installing various kinds of compilers.

    The following two sites are the most famous:

    Online Judge systems are also good places to do the test:

    You may be curious about the compiler version and compile options they use to compile your code. Some sites will show you the information. But some won't. Then you're on your own way to discover the secret.

    First, let's check some tricks to ask the compiler to tells us some useful information

    1. Output macro
    In C++, string literal can't be used as template argument. Most of the compilers will include the content of the string literal when they generate the error message. The following code snippet shows the magic.

    template<const char *> class A {};
    #define F0(x) #x
    #define F1(x) F0(x)
    #define V1 F1(__GNUC__)
    int main()
    {
        A<V1> a1;
    }

    Here, macro F0 and F1 are used to convert the integral into string. The compiler will show message like "literal "3" is not a valid template argument because it is the address of an object with static linkage". And "3" is what you want.

    2. Output constant
    Similarly, float is not allowed as template argument. And the compiler will show the float in the error message. This can be used to output float constant. Unfortunately, VC will implicitly convert float into int when it is used as template argument (see https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=296008 for details)
    Here is another trick to output integral constant:

    template<int N>
    class A
    {
    private:
        A();
    };
    int main()
    {
        A<sizeof(char)> a;
    }

    3. Output type
    Sometimes, you may not be sure about the type of some variables. Then you can use the following code to detect their type.

    struct Dummy {};
    void Fun(const Dummy &);
    int main()
    {
        Fun(1+1U);
    }

    BTW, gcc will show "not convert `2' to `const Dummy&'", then you can use "template<typename T> void Fun(T);" for the declaration of Fun (that means you can use the above code to output constant in gcc)

    4. Find out include path
    For STL, you can use

    #include <ext/hash_set>
    using __gnu_cxx::hash_set; 

    int main()
    {
        hash_set<> m;
    }

    vector is OK, too.

    Now it's time to find out the compiler information. To find out the compiler version, you can check this page: Pre-defined Compiler Macros

    The information of dinkumware online test:

    1. VC
    Version (_MSC_FULL_VER):
             VC8 140050727
             VC7.1 13103077
             VC7 13009466
             VC6 12008804
    Include path:
             D:\cplt501_vc_source\include (with _CPPLIB_VER=501)

    2. EDG
    Version (__EDG_VERSION__):
             3.05 
    Compile option:
             edgcc --alt -D_WIN32 -D_M_IX86 --no_inlining --diag_suppress=1290 --long_long --wchar_t_keyword -D_C99 -ID:\cplt501_gen_source/include/c -ID:\cplt501_gen_source/include -IC:\PROGRA~1\MICROS~2.NET\VC7/include --exceptions --microsoft -c sourceFile.cpp
    Because VC compatible mode is used, it may emulate bugs in VC, too

    3. GCC
    Version:
             3.2.0
    Include path:
             D:/cplt501_gen_source/include and D:/Mingw/include/c++/3.2/
    This version is really out of date


© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker