Re: Perl hashing speedup ?
- From: "sujit" <sujitkumar.bade@xxxxxxxxx>
- Date: 6 Sep 2006 01:25:29 -0700
Hi,
Yitzchak Scott-Thoennes wrote:
If you are able to upgrade to a recent 5.8.x release, you may see an
improvement, if part of your problem is that hash bucket usage is
disbalanced.
I feel that disbalanced bucket usage / and less number of hash buckets
might be reason for this slow hash creation process.
Regarding this multi-level hash creation at level-1 itself we are
having 209249 unique keys.
Now I have some queries regarding way perl hashing mechanism maps keys
to buckets.
I am not too sure how many default number of buckets will be used to
map this. If number of buckets are very less then it would result in
lots of hash collisions and that will slow down entire thing ( that's
what I feel -:) )
Now can we have this info:
By default perl will be using how many buckets?
Is there any way we can increase numbers of buckets used by perl
hashes?
I got some info from http://www.perl.com/pub/a/2002/10/01/hashes.html
about how perl hashes work internally; but I guess we will be needing
information how to tweak perl hashes for this kind of requirement.
If you can see following stats while inserting initial set of keys (for
310,000 records) things were going quite fine, then there is gradual
increase for subsequent records. I feel this may be due to heavy hash
collisions. Then if you observe stats for record numbers (710000 to
770000); things were going again back on track; although right now I am
not able to confirm reason for same but I feel that this may be the
case where records were getting mapped to relatively non-collision hash
buckets. Hopefully I am thinking in right direction; if that being the
case if we can somehow increase number of hash buckets then things
should go fine.
Following are some stats; when I am creation only one level hash (using
only Key-1)
Rec Nos Time in seconds
10000 : Time: 3
20000 : Time: 2
30000 : Time: 3
40000 : Time: 3
50000 : Time: 3
60000 : Time: 3
70000 : Time: 3
80000 : Time: 3
90000 : Time: 4
100000 : Time: 3
110000 : Time: 3
120000 : Time: 3
130000 : Time: 3
140000 : Time: 2
150000 : Time: 3
160000 : Time: 3
170000 : Time: 3
180000 : Time: 3
190000 : Time: 3
200000 : Time: 3
210000 : Time: 2
220000 : Time: 4
230000 : Time: 3
240000 : Time: 3
250000 : Time: 3
260000 : Time: 4
270000 : Time: 3
280000 : Time: 4
290000 : Time: 3
300000 : Time: 4
310000 : Time: 4
320000 : Time: 5
330000 : Time: 5
340000 : Time: 5
350000 : Time: 6
360000 : Time: 6
370000 : Time: 7
380000 : Time: 7
390000 : Time: 8
400000 : Time: 8
410000 : Time: 8
420000 : Time: 11
430000 : Time: 12
440000 : Time: 12
450000 : Time: 14
460000 : Time: 14
470000 : Time: 14
480000 : Time: 14
490000 : Time: 14
500000 : Time: 17
510000 : Time: 17
520000 : Time: 20
530000 : Time: 18
540000 : Time: 21
550000 : Time: 21
560000 : Time: 22
570000 : Time: 22
580000 : Time: 22
590000 : Time: 24
600000 : Time: 26
610000 : Time: 24
620000 : Time: 26
630000 : Time: 25
640000 : Time: 27
650000 : Time: 27
660000 : Time: 27
670000 : Time: 29
680000 : Time: 30
690000 : Time: 30
700000 : Time: 32
710000 : Time: 5
720000 : Time: 3
730000 : Time: 3
740000 : Time: 3
750000 : Time: 3
760000 : Time: 2
770000 : Time: 3
780000 : Time: 17
790000 : Time: 35
800000 : Time: 38
810000 : Time: 36
820000 : Time: 38
830000 : Time: 36
840000 : Time: 39
850000 : Time: 40
860000 : Time: 40
870000 : Time: 41
880000 : Time: 42
890000 : Time: 41
900000 : Time: 41
910000 : Time: 43
920000 : Time: 43
930000 : Time: 41
940000 : Time: 43
950000 : Time: 41
960000 : Time: 46
970000 : Time: 44
980000 : Time: 45
990000 : Time: 46
1000000 : Time: 45
1010000 : Time: 43
1020000 : Time: 44
1030000 : Time: 47
1040000 : Time: 46
1050000 : Time: 44
1060000 : Time: 48
1070000 : Time: 49
1080000 : Time: 50
1090000 : Time: 50
1100000 : Time: 50
1110000 : Time: 52
1120000 : Time: 50
1130000 : Time: 47
Thanks
Sujit
Yitzchak Scott-Thoennes wrote:
sujit wrote:.
This is regarding perl hashing. I am creating a 3 level hash (actually
2 such hashes at a time) with approx. 2.6 million entries.
Regarding length of each key;
Level 1 key length - Approx 40 chars
Level 2 Key length - Approx 12 chars
Level 3 key length - Approx 40 chars
Now this hash creation process itself is taking around 3.5 to 4 hrs for
me.
Is there any way to improve per default hashing function for speed-up
(or some better way to approach this requirement)? Any URL's which
could be helpful on same are welcomed -:)
Perl version information-
perl, v5.6.1 built for MSWin32-x86-multi-thread
If you are able to upgrade to a recent 5.8.x release, you may see an
improvement, if part of your problem is that hash bucket usage is
disbalanced.
Another idea would be to revert to perl4-style "emulated" multi-level
hashes; see the syntax description in
http://perldoc.perl.org/perlvar.html#%24%3B
- Follow-Ups:
- Re: Perl hashing speedup ?
- From: Ilya Zakharevich
- Re: Perl hashing speedup ?
- References:
- Perl hashing speedup ?
- From: sujit
- Re: Perl hashing speedup ?
- From: Yitzchak Scott-Thoennes
- Perl hashing speedup ?
- Prev by Date: Re: Perl hashing speedup ?
- Next by Date: Re: Perl hashing speedup ?
- Previous by thread: Re: Perl hashing speedup ?
- Next by thread: Re: Perl hashing speedup ?
- Index(es):
Relevant Pages
|
|