Create multiple threads from within your application
When I posted this Sample program to create multiple threads, I knew the inevitable follow-up question was “can I run my VFP code in separate threads?”. Sure enough, several people asked, citing various valid usage scenarios.
Below is a class that you can use to run your VFP code in multiple separate threads. It can create as many threads as you like, each of which is running VFP code. Because it is a multithreaded sample, it requires the multithreaded runtime, which is just a few megabytes.
The sample code uses the class by creating a routine called MyThreadFunc, which is a CPU intensive task that sums the integers from 1 to some large fixed number and then inserts the result into a table. This task is repeated a few times. The time is measured to call MyThreadFunc from N+1 different threads (including the main thread). That time is compared with calling the same code the same number of times, but only from the main thread.
On my dual processor machine (with hyperthreading on), with 10 threads, the performance was almost double the performance of a single thread, as expected. On my single processor laptop, the performance is roughly the same, with the single threaded slightly faster than the multithreaded, due to thread overhead.
If I change the task to be less CPU intensive and more shared resource intensive by making it update a single shared table multiple times, the performance gain decreases, due to contention for a single shared resource as expected.
The code requires that you have a multithreaded COM DLL built from this code: Blogs get 300 hits per hour: Visual FoxPro can count. The Ic1 interface and the MyDoCmd method are defined in that DLL.
This code really maxes out your CPU when running with many threads. In fact, while I had task man open, I saw the CPU usage at very low numbers for an instant when I expected it to be very high because Taskman didn’t get enough CPU to update its display!
Observe from the results table that various threads complete at various times, interweaving their results, meaning that in the middle of computation, the threads are swapped out.
What kind of performance numbers do you see?
Here’s how ThreeadClass works: it allocates memory for various items, such as GUIDs, strings, and the generated code. It generates machine code into a string, and calls CreateThread, pointing to that string as the Thread procedure to execute. The threads are stored in an array. CoCreateInstance is called to create an instance of the VFP COM object. That object’s MyDoCmd method is called via early binding through it’s vTable.
The vTable of the Ic1 interface (which inherits from IDispatch, which inherits from IUnknown) is expected to be mapped out like this:
0 QueryInterface IUnknown
1 AddRef IUnknown
2 Release IUnknown
3 GetTypeInfoCount IDispatch
4 GetTypeInfo IDispatch
5 GetIDsOfNames IDispatch
6 Invoke IDispatch
7 MyDoCmd Ic1
8 MyEval Ic1
For more on vTable layout see my Paper on Visual Foxpro and Advanced COM
MyDoCmd is entry #7. 7 * 4 bytes per pointer = 28, which is 1c in hex.
sCode = sCode + CHR(0xff) + CHR(0x50) + CHR(0x1c) && call [eax+1ch] && call indirect the function at 1ch in the vTable
Likewise, the Release is entry #2, so the offset is 8.
The preamble does a PUSH EBP, MOV EBP, ESP to set up a stack frame. At the end, there is a MOV ESP, EBP, POP EBP and RET 4 to release the frame. The stack frame means local variables can be accessed via a negative offset from EBP and any parameters with a positive offset. The RET 4 pops off the single 4 byte argument (bstrArg) that was passed in and returns, terminating the thread.
If your t1.dll is in a COM + application (as mine is on some machines), then you will need to specify your particular Guid for Ic1 (found in your t1.vbr file) instead of IDispatch. Then COM will know to marshall the rest of the vTable. In a COM+ application the ProcessIds will be different:
x=CREATEOBJECT("t1.c1")
?_vfp.ProcessId, x.MyEval("_vfp.ProcessId")
A COM+ app also might not have rights to write to the disk, and may not shut down the server when you expect: it might keep an instance around for quick activation. It also will probably have a different Current Directory, so may not find MyThreadProc.prg
All threads in the sample run the same code, but you can certainly make them run different code various ways. Also, the threads can be kept alive in a thread pool, perhaps waiting for more tasks to execute.
As an interesting exercise, try making the thread procedure just Sleep 10 seconds
IF .t. OR p2="Thread"
DECLARE integer Sleep IN WIN32API integer
Sleep(10000)
ENDIF
With 10 threads, the 11 calls to ThreadProc (including the one in the main thread), the Sleeps occur in parallel and it finishes in 10 seconds. In a single threaded app, it takes 110 seconds to execute them in series!
The machine code generated is similar to this pseudo C++ code which has no error checking
DWORD WINAPI ThreadProc(LPVOID dwParam)
{
IDispatch *ptrCOMObj;
VARIANT vtResult;
CoInitialize(0); // initialize COM
CoCreateInstance(cClsId,0, CLSCTX_INPROC_SERVER+CLSCTX_LOCAL_SERVER, &cIid, &ptrCOMObj);
VariantInit(&vtResult);
ptrComObj->MyDoCmd(bstrCmd("do d:\fox90\test\MyThreadFunc WITH p2"), dwParam as Variant,vtEmpty,vtEmpty,vtEmpty,&vtResult);
VariantClear(vtResult); //unused, but needs to be freed
ptrCOMObj->Release(); // release the COM server
CoUninitialize();
}
I used a C++ project in Visual Studio with inline ASM code and Show Disassembly to get the machine language bytes. My Intel Programmers Reference manual was also useful. The thread proc isn’t optimized, but it doesn’t have to be. Almost no time is spent there, compared with calling the COM server.
See also: Windows Security and how it affects running generated code
The VFP code (with minimal error checking):
CLEAR ALL
CLEAR
SET EXCLUSIVE OFF
SET SAFETY OFF
SET ASSERTS ON
CREATE TABLE ThreadLog (threadid i, timestamp t,misc c(50)) && A table into which each thread will insert results
USE ThreadLog && open shared
TEXT TO cstrVFPCode TEXTMERGE NOSHOW && generate the task to run: MyThreadFunc
PROCEDURE MyThreadFunc(p2) && p2 is the 2nd param to MyDoCmd
TRY && use exception handling
DECLARE integer GetCurrentThreadId in WIN32API
LOCAL i,j,k,nSum
FOR k = 1 TO 3 && calculate result 3 times per thread
nSum=0
FOR i = 1 TO 500000
nSum=nSum+i
ENDFOR
INSERT INTO <<DBF()>> VALUES (GetCurrentThreadId(), DATETIME(),TRANSFORM(p2)+":"+TRANSFORM(nSum))
ENDFOR
CATCH TO oex
INSERT INTO <<DBF()>> VALUES (GetCurrentThreadId(), DATETIME(),p2+" Error: "+oex.message)
ENDTRY
RETURN
ENDTEXT
STRTOFILE(cstrVFPCode,"MyThreadFunc.prg")
COMPILE MyThreadFunc.prg
nThreads=10 && WaitForMultipleObjects MAXIMUM_WAIT_OBJECTS = 64
nStart=SECONDS()
ox=CREATEOBJECT("ThreadClass")
cStrIid="{00020400-0000-0000-C000-000000000046}" && IID_IDispatch
* cStrIid="{3608114E-633A-44FF-8E51-1BBCF7225018}" && IID_Ic1 from your t1.vbr file.
ox.StartThreads(nThreads,"do "+SYS(5)+CURDIR()+"MyThreadFunc WITH p2","Thread: %threadnum",cStrIid)
?TRANSFORM(nThreads)+" threads created. Main thread calculating..."
MyThreadFunc("Main") && main thread will do calculation too, rather than just being idle
?"Main thread done: waiting for other threads"
ox.WaitForThreads("?'main thread waiting'") && wait til threads finish
?"Using "+TRANSFORM(nThreads)+" threads takes "+TRANSFORM(SECONDS()-nStart)+" seconds"
?"Now try single threaded:"
nStart=SECONDS()
FOR i = 1 TO nThreads+1 && add one for the main thread
MyThreadFunc("SingleThread")
ENDFOR
?"Single thread takes "+TRANSFORM(SECONDS()-nStart)+" seconds"
LOCATE && Go to the first record
BROWSE LAST NOWAIT && show the results
#define CREATE_SUSPENDED 0x00000004
#define INFINITE 0xFFFFFFFF
#define WAIT_TIMEOUT 258
#define CLSCTX_INPROC_SERVER 1
#define CLSCTX_LOCAL_SERVER 4
#define VT_BSTR 8
DEFINE CLASS ThreadClass as session
hProcHeap =0
nThreads=0
DIMENSION hThreads[1] && Handle to each thread
cThreadHandles="" && Handle to each thread as a string rep of an int array
PROCEDURE Init
DECLARE integer LoadLibrary IN WIN32API string
DECLARE integer FreeLibrary IN WIN32API integer
DECLARE integer GetProcAddress IN WIN32API integer hModule, string procname
DECLARE integer CreateThread IN WIN32API integer lpThreadAttributes, ;
integer dwStackSize, integer lpStartAddress, integer lpParameter, integer dwCreationFlags, integer @ lpThreadId
DECLARE integer ResumeThread IN WIN32API integer thrdHandle
DECLARE integer CloseHandle IN WIN32API integer Handle
DECLARE integer GetProcessHeap IN WIN32API
DECLARE integer HeapAlloc IN WIN32API integer hHeap, integer dwFlags, integer dwBytes
DECLARE integer HeapFree IN WIN32API integer hHeap, integer dwFlags, integer lpMem
DECLARE integer WaitForSingleObject IN WIN32API integer hHandle, integer dwMilliseconds
DECLARE integer WaitForMultipleObjects IN WIN32API integer nCount, string pHandles, integer bWaitAll, integer dwMsecs
DECLARE integer CLSIDFromProgID IN ole32 string lpszProgID, string @ strClSID
DECLARE integer CLSIDFromString IN ole32 string lpszProgID, string @ strClSID
DECLARE integer SysAllocString IN oleaut32 string wstr
DECLARE integer SysFreeString IN oleaut32 integer bstr
CREATE CURSOR memAllocs (memPtr i, AllocType c(1)) && track mem allocs that need to be freed: H=Heap,B=BSTR,L=Library
this.hProcHeap = GetProcessHeap()
PROCEDURE StartThreads(nThreads as Integer, ThreadCmd as String, ThreadProcParam as String,cStrIid as String )
this.nThreads = nThreads
cClsId=SPACE(16)
IF CLSIDFromProgID(STRCONV("t1.c1"+CHR(0),5),@cClsId)!= 0 && dual interface
?"Error: class not found"
RETURN
ENDIF
cIid=SPACE(16)
CLSIDFromString(STRCONV(cStrIid+CHR(0),5),@cIid)
nLocals = 30 && sufficiently large for local vars
sCode="" && generate machine code for thread procedure into a string
* sCode = sCode + CHR(0xcc) && int 3 DebugBreak() to attach a debugger
sCode = sCode + CHR(0x55) && push ebp
sCode = sCode + CHR(0x8b) + CHR(0xec) && mov ebp, esp
sCode = sCode + CHR(0x81) + CHR(0xec)+BINTOC(nLocals * 4, "4rs") && sub esp, nLocals
sCode = sCode + CHR(0x6a) + CHR(0x00) && push 0
sCode = sCode + this.CallDllFunction("CoInitialize", "ole32")
sCode = sCode + this.GenCodeAtPoint("BeforeStart")
sCode = sCode + CHR(0x8d) + CHR(0x45)+CHR(0xf0) && lea eax, [ebp-10h] && addr to put COM ptr
sCode = sCode + CHR(0x50) && push eax