tag:blogger.com,1999:blog-2326624813533383062.post1523050679890837840..comments2024-03-23T05:59:17.256+01:00Comments on Developer Blog: GBIF Name ParserTim Robertsonhttp://www.blogger.com/profile/07889700598656669041noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-2326624813533383062.post-64880339197689704922017-07-19T17:42:06.918+02:002017-07-19T17:42:06.918+02:00I have run more tests and the GNA parser becomes f...I have run more tests and the GNA parser becomes faster the more names are parsed.<br />Repeating the same 1380 names 1x, 10x and 100x:<br /><br />GBIF - total time parsing 1380 names: 1128 ms<br />GNA - total time parsing 1380 names: 1666 ms<br /><br />GBIF - total time parsing 13800 names: 13612 ms<br />GNA - total time parsing 13800 names: 5023 ms<br /><br />GBIF - total time parsing 138000 names: 148381 ms<br />GNA - total time parsing 138000 names: 27333 ms<br /><br /><br />I thought there might be some caching involved in the GNA parser, as the GBIF parsing time is rather linear.<br />So Ive tried 1380 1x, 10x, 100x with random binomials including an author (e.g. Zpc aafoax Iiv; Aioaeuzoai eaemeau Oeovzmboular)<br /><br />GBIF - total time parsing 1380 names: 333 ms<br />GNA - total time parsing 1380 names: 1516 ms<br /><br />GBIF - total time parsing 13800 names: 2266 ms<br />GNA - total time parsing 13800 names: 3511 ms<br /><br />GBIF - total time parsing 138000 names: 18715 ms<br />GNA - total time parsing 138000 names: 16223 ms<br /><br /><br />As you can see GBIF becomes a lot faster with these simple names. Both parsers are not linear anymore, GBIF gets slightly faster but the GNA one gets a lot faster. It takes only 10x more time for 100x names.<br /><br />I finally tried a million random names and added also an authorship year, e.g. Uouixuu eeuouao Vxgoea, 1806<br /><br />GBIF - total time parsing 1000000 names: 188939 ms<br />GNA - total time parsing 1000000 names: 113468 ms<br /><br /><br />As I said this does not use a test framework and e.g. JVM garbage collection can happen anytime. Still interesting behavior and the performance between both depends on the number of names to be parsed and also the kind of names.Anonymoushttps://www.blogger.com/profile/02525336976753861766noreply@blogger.comtag:blogger.com,1999:blog-2326624813533383062.post-81017184932103337892017-07-19T15:42:30.982+02:002017-07-19T15:42:30.982+02:00the parsing time is without JVM startup and the pa...the parsing time is without JVM startup and the parser instance is created before time is measured. This is just the parsing time. Not done perfectly with a proper performance framework, but still. The parsing times highly depend on the name being parsed. Large authorships with 20 or more authors can slow down the GBIF parser for example. The test set of 1380 names explicitly tries not to be just simple binomials.Anonymoushttps://www.blogger.com/profile/02525336976753861766noreply@blogger.comtag:blogger.com,1999:blog-2326624813533383062.post-83645830807961184542017-07-10T19:14:33.470+02:002017-07-10T19:14:33.470+02:00For benchmarking we used 1 000 000 instead of 1000...For benchmarking we used 1 000 000 instead of 1000, because JVM takes time to load, and because about any parser is fast enough for 1k of names. Both GBIF and GN parsers are way way faster than 1000 or 500 names/secdimushttps://www.blogger.com/profile/16854769465790401881noreply@blogger.comtag:blogger.com,1999:blog-2326624813533383062.post-11753230279097506382017-07-08T11:09:55.614+02:002017-07-08T11:09:55.614+02:00Performance wise --- might the number of names? We...Performance wise --- might the number of names? We used 1M names for performance testsdimushttps://www.blogger.com/profile/16854769465790401881noreply@blogger.com