Quote: Faster than indexing the string as an array Please let me doubt about it.
Your code produces (gcc cross-compilation for ARM Cortex-M0, optimization set to -O3)
253 0000 30B5 push {r4, r5, lr}
259 0002 0028 cmp r0, #0
260 0004 13D0 beq .L37
262 0006 0378 ldrb r3, [r0]
263 0008 002B cmp r3, #0
264 000a 10D0 beq .L37
266 000c 002A cmp r2, #0
267 000e 0ED0 beq .L37
268 0010 0024 movs r4, #0
269 0012 02E0 b .L34
271 .L40:
272 0014 013A subs r2, r2, #1
275 0016 002A cmp r2, #0
276 0018 07D0 beq .L38
278 .L34:
280 001a CB1A subs r3, r1, r3
281 001c 5D42 rsbs r5, r3, #0
282 001e 6B41 adcs r3, r3, r5
284 0020 0130 adds r0, r0, #1
287 0022 E418 adds r4, r4, r3
290 0024 0378 ldrb r3, [r0]
291 0026 002B cmp r3, #0
292 0028 F4D1 bne .L40
294 .L38:
295 002a 2000 movs r0, r4
297 .L32:
300 002c 30BD pop {r4, r5, pc}
302 .L37:
304 002e 0020 movs r0, #0
306 0030 FCE7 b .L32
while the following one
int STR_CountChar_Alt( const char s[], char c, int m )
{
int count = 0;
for ( int k = 0; s[k] && (k < m); ++k)
{
if (s[k] == c )
++count;
}
return count;
}
produces
323 0000 30B5 push {r4, r5, lr}
330 0002 0378 ldrb r3, [r0]
331 0004 002B cmp r3, #0
332 0006 10D0 beq .L46
333 0008 002A cmp r2, #0
334 000a 0EDD ble .L46
335 000c 0400 movs r4, r0
336 000e 8218 adds r2, r0, r2
338 0010 0020 movs r0, #0
340 0012 02E0 b .L44
342 .L51:
343 0014 0134 adds r4, r4, #1
346 0016 9442 cmp r4, r2
347 0018 06D0 beq .L42
349 .L44:
351 001a CB1A subs r3, r1, r3
352 001c 5D42 rsbs r5, r3, #0
353 001e 6B41 adcs r3, r3, r5
354 0020 C018 adds r0, r0, r3
357 0022 6378 ldrb r3, [r4, #1]
358 0024 002B cmp r3, #0
359 0026 F5D1 bne .L51
361 .L42:
365 0028 30BD pop {r4, r5, pc}
367 .L46:
369 002a 0020 movs r0, #0
371 002c FCE7 b .L42
Now, I might be wrong (since I have NOT computed the exact execution cycles, nor I have measured them), but the former assembly code doesn't look faster then the latter one.
I suppose it could be a case of 'premature optimization'.
"In testa che avete, Signor di Ceprano?"
-- Rigoletto
|