This is an old revision of the document!
This page explains the digitization process. First, the standard workflow is presented, then more detailed information on the different steps involved is given.
This section describes the standard workflow when digitizing a specimen. Here is a chart illustrating the steps. Below, the blocks in the chart will be explained.
Each specimen that is to be digitized enters the process via one of the inboxes.
First the workplace is prepared. For information on this, please refer to the Institution specific section. After the workplace is set up, the digitization process begins. The specimens to be digitized are to be taken from the inbox with the highest priority.
Before actual specimens are digitized, the quality of the scan has to be assured. This can involve, depending on the setup:
Steps, that should, in some way, always be taken, involve:
See institution specific information for examples on how this might be done.
This section is almost entirely institution specific, as the process of taking the pictures depends entirely on the setup. While taking the pictures, please follow the filenaming-and scanning conventions.
When the digitization process is finished, make sure that all scanned specimens are filed/labeled in a way that is easily understandable for all involved. This can be done by using log-files and checklist labels.
Administrative steps that are required include: Locking and freeing specimens, executing additional programs and keeping track (log files).
As long as some of the digital checks are missing, the specimen should not be processed further under any circumstances. In order to guarantee this, they are “locked”. That is, they are kept in place, and not moved until the lock is lifted.
Once all digital checks are done, the specimen goes to the visual check. At this point, the lock is lifted, there is no restraint on the further processing anymore. Together with the freeing, the images have to be inserted into the archive, so they are visible in the viewer for the visual check.
In order to handle all the different threads that come together during digitization, effort to keep track of specimens should be made. This includes:
Before inserting an image into the archive, it should be checked for obvious mistakes. This involves three steps:
Open the newly scanned folder with a list view of the specimens. According to the filenaming conventions certain characteristics have to be followed, such as:
If the check was done before inserting it into the archive, it can simply be renamed. Otherwise, the case should be checked back, to clear what is to be done to the wrongly named file in the archive.
Open the newly scanned folder in thumbnail view. Look to see if the principal conditions that are necessary for a valid image are met:
The final digital check is the Kew check. It is explained in further detail here. The purpose of the Kew check is to find digital errors that the scanner made. These are potentially invisible during the visual check.
Any errors that show on the thumbnail or the Kew check go back to a priority inbox. This is described in Processing of faulty images.
The visual image check is the last check in the digitization process. If all other checks for the specimens are made, the image that was taken is compared to the original sheet. This is to ensure, that external people opening the image get the correct visual information. Therefore, this check is only done after importing the pictures into the archive, and is made using the viewer of the database.
The check is based on a general view of the sheet (does it look the same?/ is it the same?). Pay attention to following key features:
If something is unclear, it should be discussed with the relevant person. If a problem is found, the specimen reenters the process via this step. Once this check is passed, the specimen is completely digitized.
At this step, all specimens with deficient images are collected. For these, follow these steps:
If the image has already been inserted into the archive, the insert of the rescan will fail. After it has failed, you have to open the images window of the database, and go to the thread-logs. Choose the server where a thread log to the latest import is found. Open this thread log and click on the error message of the concerned rescan. Then confirm to force the import, and click on import pictures.
See further information under Forcing import
Inboxes are the place from where the supply for digitization is fetched. In order for a specimen to be digitized, is has to enter one of these inboxes.
The digitization process is supplied with material to be scanned by potentially more than one source, and the supply is of varying urgency. For example, there might be a request for an image by another institution. In this case, this image is probably more urgent than other material that is to be scanned. Likewise, the control mechanisms might require an image to be scanned again, this again is to be processed in priority. On the other hand, the majority of the material will be provided with no specific priority.
A set of inboxes has to be defined. There is the regular inbox stack and there are priority inboxes.
Most of the specimens will be equally important. These are filed in the regular inbox stack. By construction, this stack should have considerably more throughput than all the priority inboxes combined.
There are several, very different reasons, why a specimen is given priority over others:
In general, one can say, that if there is something special about a specimen that is to be digitized, it should be prioritised. Therefore, priority inboxes for different classes of cases have to be introduced. For example:
These inboxes then have to be ordered according to which has the most priority. What categories the priority inboxes include, depends on the institution.
The image file has to be saved following a strict naming convention. Otherwise the viewer will not be able to find the image corresponding to a dataset. The naming convention consists of three tokens: the prefix, the body (number) and the postfix. Every file has a prefix and body, the postfix is optional (depending on the situation).
For example the file w_0044765_01 consists of
w | 0044765 | 01 |
prefix | body | postfix |
The prefix, body and postfix are always seperated by underscores.
Note: Tabulae have different conventions for each part!
The prefix usually represents the herbarium where the specimen lies.
Examples:
For tabulae, this prefix is always “tab”.
The body is always a number. For specimen the number consists of the acquisition number, with the following schemes:
For tabulae, the body is the database ID. It is found in the edit specimens-page in the top left corner.
Postfixes are not necessary for basic scans. They are required in certain circumstances. For specimens, there are two reasons for postfixes:
Note that these postfixes can be stacked. So, if for example, something changed on the specimen 1889-0003454 from W in a hidden area:
If there are more than one tabulae to a specimen, they are numbered using the two digit postfix (i.e. tab_000123_01, tab_000123_02, …). Should there be one of the above cases with revisions or hidden parts for tabulae, they are also numbered in the two digit postfix.
How to operate the scanning setup is dependent on how the setup actually is. These conventions however, should be followed more or less independent of the concrete setup.
If a specimen has already been databased, always fetch the dataset when preparing the sheet. Depending on the material a quick glance or a more thorough look should be given to the correctness of the dataset. Finally, it is good practice to copy the number for the filename from the dataset, instead of taking it from the sheet (this reduces copy mistakes, but watch out for blanks). Check the dig. image checkbox of the dataset only after finishing the scan.
The image should show as much information of the sheet as possible. For this capsules should be emptied on to the sheet, so the contents can be seen. If there is no possibility of preparing the specimen in a way that all of the information can be seen on one image (a label hides a plant, something is on the backside), multiple pictures have to be taken, so all the information is accessible (see filenaming conventions for further information). In order to facilitate extracting the information from the image, take care that the sheet is prepared in a structured way, and leave space for the color target and the scale, which are put on it on the table. For Cryptogames: if there are both vegetative and fertile parts of the specimen, make sure that they are visible.
The time the specimen lies on the scanning table should be minimized. A color target and a scale should always be added on the specimen for reference. The color target should be placed along the left edge of the sheet, preferably in the lower corner, with the white square pointing to the lower right. The scale should be placed preferably on the right or left edge, in the middle. If there isn't enough space on the sheet place the target and/or scale next to it. After finishing the scan, mark the specimen and its folder as scanned (if necessary).
Digital errors are barely visible to the naked eye, and are expected to be overlooked. For this reason, a script was devised at Kew to make these errors visible. This script is to be run as standard procedure on all files. This control mechanism is called the Kew check. This check generates distorted files of the original, with a hugely boosted contrast, where digital errors are easily seen.
Open Photoshop and go to Automation → Batch execution. There, select the right script, the newly scanned files, and a valid target folder. Now just run the script, and wait until it has generated all the control images.
Use an Image viewer (e.g. IrfanView) to look at each of the control images separately:
Take a look at the original file for everything that looks suspicious. Since the control images are very distorted, this will be a lot in the beginning. But with a little practice, you get a feeling what is normal.
Once you have a feeling for how a good picture looks in the Kew control image, look for things that don't belong there:
If you have found something suspicious, check back with the original file whether this is truly an error, or you were mistaken. For every image that has an error, get the specimen, delete the picture, and reinsert it into the scanning process (for example into a rescan priority inbox).
Any sheet of paper - including photos - that is not fixed to the specimen, is called a tabula here. If there is one or more of such tabulae to a specimen, it has to be digitized on its own. Most important, is the unique naming convention for tabulae:
Tabulae need neither color targets or scales.
This is a work in progress…
The priority Inboxes in W are as follows(with priority):
Note: Unlike the Pentacon Setup, the iXR/Credo setup does not need any special light conditions. There is no need to switch off any light or warm up the flashes.
Note: Please sign in at the Google Calendar for scanning and fix the time you wish to work on the scanners. There is a very tight schedule for scanning.
!! important: please do NOT personalize any settings of the software !!
Note: Please read the in-depth startup sequence guide if you are not familiar with the system!
Note: Ensure that the camera power supply is turned off before proceding!
1) Computer startup:
2) Camera startup:
3) Camera activation:
4) Depth of focus check:
Note: please do this only if the picture is not sharp!
6) LCC profile
Note: this step has to be taken every day!
Note: this has to be done BEFORE any pictures are taken and WITHOUT any specimens on the table.
7) Taking pictures
8) Entwickeln
Note: Aktivierung von Stapelverarbeitung (links oben)!
9) Screen calibration:
Note: this step is taken automatically every second week on Monday.
Manual calibration (if necessary):
screen shots
10) Annotations and troubleshooting
Information can be found on the 'cheat sheet' in the scanning room.
There is a log sheet beside each scanner. For each scanner, write down:
A log file is to be found on google drive for the rescans. Fill in:
Images are copied to an internal server at the end of a scanning day.