Table of Contents

Digitization

This page explains the digitization process. First, the standard workflow is presented, then more detailed information on the different steps involved is given.

Workflow

This section describes the standard workflow when digitizing a specimen. Here is a chart illustrating the steps. Below, the blocks in the chart will be explained.

 digitisation - diagram_digitisation_workflow_english

Preparation

Each specimen that is to be digitized enters the process via one of the inboxes.

First the workplace is prepared. For information on this, please refer to the Institution specific section. After the workplace is set up, the digitization process begins. The specimens to be digitized are to be taken from the inbox with the highest priority.

Scanning


Preparing the scanning setup

Before actual specimens are digitized, the quality of the scan has to be assured. This can involve, depending on the setup:

Steps, that should, in some way, always be taken, involve:

See institution specific information for examples on how this might be done.


Taking the pictures

This section is almost entirely institution specific, as the process of taking the pictures depends entirely on the setup. While taking the pictures, please follow the filenaming-and scanning conventions.


Ending the scanning

When the digitization process is finished, make sure that all scanned specimens are filed/labeled in a way that is easily understandable for all involved. This can be done by using log-files and checklist labels.

Administrative steps

Administrative steps that are required include: Locking and freeing specimens, executing additional programs and keeping track (log files).

Locking specimens

As long as some of the digital checks are missing, the specimen should not be processed further under any circumstances. In order to guarantee this, they are “locked”. That is, they are kept in place, and not moved until the lock is lifted.

Freeing specimens

Once all digital checks are done, the specimen goes to the visual check. At this point, the lock is lifted, there is no restraint on the further processing anymore. Together with the freeing, the images have to be inserted into the archive, so they are visible in the viewer for the visual check.

Additional programs

Keeping track (log files)

In order to handle all the different threads that come together during digitization, effort to keep track of specimens should be made. This includes:

Digital image check

Before inserting an image into the archive, it should be checked for obvious mistakes. This involves three steps:

Filename check

Open the newly scanned folder with a list view of the specimens. According to the filenaming conventions certain characteristics have to be followed, such as:

If the check was done before inserting it into the archive, it can simply be renamed. Otherwise, the case should be checked back, to clear what is to be done to the wrongly named file in the archive.


Thumbnail check

Open the newly scanned folder in thumbnail view. Look to see if the principal conditions that are necessary for a valid image are met:


The Kew check

The final digital check is the Kew check. It is explained in further detail here. The purpose of the Kew check is to find digital errors that the scanner made. These are potentially invisible during the visual check.

Any errors that show on the thumbnail or the Kew check go back to a priority inbox. This is described in Processing of faulty images.

Visual image check

The visual image check is the last check in the digitization process. If all other checks for the specimens are made, the image that was taken is compared to the original sheet. This is to ensure, that external people opening the image get the correct visual information. Therefore, this check is only done after importing the pictures into the archive, and is made using the viewer of the database.

The check is based on a general view of the sheet (does it look the same?/ is it the same?). Pay attention to following key features:

If something is unclear, it should be discussed with the relevant person. If a problem is found, the specimen reenters the process via this step. Once this check is passed, the specimen is completely digitized.

Processing of faulty images

At this step, all specimens with deficient images are collected. For these, follow these steps:

Inserting rescans into the archive

If the image has already been inserted into the archive, the insert of the rescan will fail. After it has failed, you have to open the images window of the database, and go to the thread-logs. Choose the server where a thread log to the latest import is found. Open this thread log and click on the error message of the concerned rescan. Then confirm to force the import, and click on import pictures.

See further information under Forcing import

Information

Inboxes

Inboxes are the place from where the supply for digitization is fetched. In order for a specimen to be digitized, is has to enter one of these inboxes.

The digitization process is supplied with material to be scanned by potentially more than one source, and the supply is of varying urgency. For example, there might be a request for an image by another institution. In this case, this image is probably more urgent than other material that is to be scanned. Likewise, the control mechanisms might require an image to be scanned again, this again is to be processed in priority. On the other hand, the majority of the material will be provided with no specific priority.

A set of inboxes has to be defined. There is the regular inbox stack and there are priority inboxes.

Regular inbox stack

Most of the specimens will be equally important. These are filed in the regular inbox stack. By construction, this stack should have considerably more throughput than all the priority inboxes combined.

Priority inboxes

There are several, very different reasons, why a specimen is given priority over others:

In general, one can say, that if there is something special about a specimen that is to be digitized, it should be prioritised. Therefore, priority inboxes for different classes of cases have to be introduced. For example:

These inboxes then have to be ordered according to which has the most priority. What categories the priority inboxes include, depends on the institution.

Filenaming conventions

The image file has to be saved following a strict naming convention. Otherwise the viewer will not be able to find the image corresponding to a dataset. The naming convention consists of three tokens: the prefix, the body (number) and the postfix. Every file has a prefix and body, the postfix is optional (depending on the situation).

For example the file w_0044765_01 consists of

w 0044765 01
prefix body postfix

The prefix, body and postfix are always seperated by underscores.

Note: Tabulae have different conventions for each part!

The prefix

The prefix usually represents the herbarium where the specimen lies.

Examples:

For tabulae, this prefix is always “tab”.

The body (number)

The body is always a number. For specimen the number consists of the acquisition number, with the following schemes:

Cases from W:

For tabulae, the body is the database ID. It is found in the edit specimens-page in the top left corner.

The postfix

Postfixes are not necessary for basic scans. They are required in certain circumstances. For specimens, there are two reasons for postfixes:

Note that these postfixes can be stacked. So, if for example, something changed on the specimen 1889-0003454 from W in a hidden area:

  1. This area must already have been digitized as w_18890003454_a
  2. The revision scan then would be w_18890003454_a_01
  3. A further revision would produce w_18890003454_a_02

If there are more than one tabulae to a specimen, they are numbered using the two digit postfix (i.e. tab_000123_01, tab_000123_02, …). Should there be one of the above cases with revisions or hidden parts for tabulae, they are also numbered in the two digit postfix.


Scanning conventions

How to operate the scanning setup is dependent on how the setup actually is. These conventions however, should be followed more or less independent of the concrete setup.

Preparing the dataset

If a specimen has already been databased, always fetch the dataset when preparing the sheet. Depending on the material a quick glance or a more thorough look should be given to the correctness of the dataset. Finally, it is good practice to copy the number for the filename from the dataset, instead of taking it from the sheet (this reduces copy mistakes, but watch out for blanks). Check the dig. image checkbox of the dataset only after finishing the scan.

Preparing the sheet

The image should show as much information of the sheet as possible. For this capsules should be emptied on to the sheet, so the contents can be seen. If there is no possibility of preparing the specimen in a way that all of the information can be seen on one image (a label hides a plant, something is on the backside), multiple pictures have to be taken, so all the information is accessible (see filenaming conventions for further information). In order to facilitate extracting the information from the image, take care that the sheet is prepared in a structured way, and leave space for the color target and the scale, which are put on it on the table. For Cryptogames: if there are both vegetative and fertile parts of the specimen, make sure that they are visible.

Working on the scanning table

The time the specimen lies on the scanning table should be minimized. A color target and a scale should always be added on the specimen for reference. The color target should be placed along the left edge of the sheet, preferably in the lower corner, with the white square pointing to the lower right. The scale should be placed preferably on the right or left edge, in the middle. If there isn't enough space on the sheet place the target and/or scale next to it. After finishing the scan, mark the specimen and its folder as scanned (if necessary).

Kew check

Digital errors are barely visible to the naked eye, and are expected to be overlooked. For this reason, a script was devised at Kew to make these errors visible. This script is to be run as standard procedure on all files. This control mechanism is called the Kew check. This check generates distorted files of the original, with a hugely boosted contrast, where digital errors are easily seen.

Running the script

Open Photoshop and go to Automation → Batch execution. There, select the right script, the newly scanned files, and a valid target folder. Now just run the script, and wait until it has generated all the control images.

Checking the images

Use an Image viewer (e.g. IrfanView) to look at each of the control images separately:

Take a look at the original file for everything that looks suspicious. Since the control images are very distorted, this will be a lot in the beginning. But with a little practice, you get a feeling what is normal.

Once you have a feeling for how a good picture looks in the Kew control image, look for things that don't belong there:

If you have found something suspicious, check back with the original file whether this is truly an error, or you were mistaken. For every image that has an error, get the specimen, delete the picture, and reinsert it into the scanning process (for example into a rescan priority inbox).

Tabulae

Any sheet of paper - including photos - that is not fixed to the specimen, is called a tabula here. If there is one or more of such tabulae to a specimen, it has to be digitized on its own. Most important, is the unique naming convention for tabulae:

Tabulae need neither color targets or scales.

Institution specific Information

Herbarium W

This is a work in progress…


W: Priority Inboxes

The priority Inboxes in W are as follows(with priority):

  1. Unique cases (very rare)
  2. Loan/Image requests
  3. Rescans
  4. Scans before/after seperation
  5. New scans of changed material
  6. Regular inbox stack

W: Filenaming conventions


W: Mamiya-Leaf

Read more about the CaptureOne software here and here.

W: Mamiya-Leaf hardware setup & working environment
Note This section is work in progess!
Note: Unlike the Pentacon Setup, the iXR/Credo setup does not need any special light conditions. There is no need to switch off any light or warm up the flashes.
Note: Please sign in at the Google Calendar for scanning and fix the time you wish to work on the scanners. There is a very tight schedule for scanning.

!! important: please do NOT personalize any settings of the software !!

W: Mamiya-Leaf day to day workflow roundup
Note: Please read the in-depth startup sequence guide if you are not familiar with the system!
W: Mamiya-Leaf startup seqeuence
Note: Ensure that the camera power supply is turned off before proceding!

1) Computer startup:

2) Camera startup:

 digitisation - CaptureOne Software

3) Camera activation:

4) Checking the depth of focus:

Note: please do this only if the picture is not sharp!

5) Creating the LCC raw file

Note: this step has to be taken every day!
Note: this has to be done BEFORE any pictures are taken and WITHOUT any specimens on the table.

6) Taking pictures

7) End session:

If this is not the first session of the day you have to import the LCC-RAW-File:

Note: Activate from menu “Stapelverarbeitung” (upper left)!

8) Trash files, Jacq-Import & Backup

9) Screen calibration:

Note: this step is taken automatically every second week on Monday.

Manual calibration (if necessary):

10) Annotations and troubleshooting

W: Mamiya-Leaf shutdown seqeuence


W: Pentacon Scanner

W: Setup the cameras

Information can be found on the 'cheat sheet' in the scanning room.

W: Setup calibration

 digitisation - digitisation_calibration

W: Setup scan

 digitisation - digitisation_scan_view

W: Day-to-day-preperation
  1. Turn on all four scanning lamps, and not the room lighting.
    Even if you only use one scanner, all lamps should be turned on, so the scans have a constant brightness.
  2. Wait for about 10 minutes. Do a couple of prescans to warm up the camera.
  3. Create a new folder on the computer for the scanning day. (See filenaming conventions)
  4. Before the first scan, calibrate the scanner.
  5. The first scan of the day is the grey-card with the additional target.
  6. Start the regular scans.
W: Ending a scanning day
  1. Check the Thumbnail view of the day. (Does the coloring look natural, are scale and color target on every scan?)
  2. Are the filenames according to the filenaming conventions?
  3. Turn off all lamps, computers, and everything else (radio, fan, …)
W: Scanner troubleshooting

W: Synchronization

W: Log files

There is a log sheet beside each scanner. For each scanner, write down:

A log file is to be found on google drive for the rescans. Fill in:

W-Krypt: Saving convention

Images are copied to an internal server at the end of a scanning day.