Wednesday, January 2, 2013

California Floristics – data analysis progress of 2 million herbarium specimens


I could not find that the Consortium of California Herbaria website had done an overall assessment of data entry progress – so I compiled the statistics below.  As of the 18 October 2012 summary by institution for the 20 listed herbaria, 70.1% of the reported 2,215,256 California specimens have been databased.

This is a quite important and remarkable proportion.

Assessment of the overall pattern of endemism and endangerment in the California flora has before now been largely a personalized, experiential process.  Data crunching has played a minor role here and there in papers, yes.  But largely the classic papers and patterns are the product of botanists summarizing their field experience.  Miles driven.  Campfires lit.  Cans of beans consumed.

At this juncture, I suggest that a second aspect of herbarium specimen digitization is needed: analysis. We have 70% of the data.  Now we need to augment additional data capture with data rectification -cleaning up heteroduplicates, data entry errors, incomplete dates or incomplete collection numbers, inconsistencies in data entry between herbaria, incorrect counties etc.   Georeferencing is also well along. 

The CCH 20 institution summary statistics as of 18 October 2012 are:

Herbarium
CA entered
Total CA
CA to go
% entered
CAS/DS
151105
520000
368895
29.1
CDA
24117
30000
5883
80.4
CHSC
69658
73381
3723
94.9
CSUSB
2003
4800
2797
41.7
DAV
70824
150000
79176
47.2
HSC
68758
80000
11242
85.9
IRVC
5675
30100
24425
18.9
OBI
11628
56000
44372
20.8
PGM
7591
7600
9
99.9
RSA/POM
384941
425958
41017
90.4
SBBG
93822
120000
26178
78.2
SCFS
1000
3000
2000
33.3
SD
110851
111000
149
99.9
SDSU
16281
16281
0
100.0
SJSU
9556
10136
580
94.3
UC/JEPS
359000
360000
1000
99.7
UCR
134204
134500
296
99.8
UCSB
17905
65000
47095
27.5
UCSC
6187
9500
3313
65.1
YM
7878
8000
122
98.5
TOTAL
1552984
2215256
662272
70.1



No comments: