This is an old revision of the document!

Digitization

This page explains the digitization process. First, the standard workflow is presented, then more detailed information on the different steps involved is given.

Workflow

This section describes the standard workflow when digitizing a specimen. Here is a chart illustrating the steps. Below, the blocks in the chart will be explained.

Preparation

Each specimen that is to be digitized enters the process via one of the inboxes.

First the workplace is prepared. For information on this, please refer to the Institution specific section. After the workplace is set up, the digitization process begins. The specimens to be digitized are to be taken from the inbox with the highest priority.

Scanning

Preparing the scanning setup

Before actual specimens are digitized, the quality of the scan has to be assured. This can involve, depending on the setup:

Letting the lighting warm up, until its color and brightness is constant
Making sure all unintended light sources are blocked out of the scanning room, especially sunlight.
Taking some pre-scans to check the functionality of the scanner.

Steps, that should, in some way, always be taken, involve:

Sticking to a regular calibration scheme
Producing control images of the state of the scanner using e.g. grey cards.

See institution specific information for examples on how this might be done.

Taking the pictures

This section is almost entirely institution specific, as the process of taking the pictures depends entirely on the setup. While taking the pictures, please follow the filenaming-and scanning conventions.

Ending the scanning

When the digitization process is finished, make sure that all scanned specimens are filed/labeled in a way that is easily understandable for all involved. This can be done by using log-files and checklist labels.

Administrative steps

Administrative steps that are required include: Locking and freeing specimens, executing additional programs and keeping track (log files).

Locking specimens

As long as some of the digital checks are missing, the specimen should not be processed further under any circumstances. In order to guarantee this, they are “locked”. That is, they are kept in place, and not moved until the lock is lifted.

Freeing specimens

Once all digital checks are done, the specimen goes to the visual check. At this point, the lock is lifted, there is no restraint on the further processing anymore. Together with the freeing, the images have to be inserted into the archive, so they are visible in the viewer for the visual check.

Additional programs

After the specimen has been locked, the Kew script has to be run.
After the specimen is freed, it has to be inserted into the archive.
other steps (institution specific)

Keeping track (log files)

In order to handle all the different threads that come together during digitization, effort to keep track of specimens should be made. This includes:

Clearly marking the inboxes
Clearly marking the stage in which specimens are
Clearly marking if specimens are scanned, and which checks are done or missing.
Organizing the scans in a folder structure (a folder per day has worked well).
Keeping log files of all essential stages (which programs have been executed…etc.).

Digital image check

Before inserting an image into the archive, it should be checked for obvious mistakes. This involves three steps:

Filename check

Open the newly scanned folder with a list view of the specimens. According to the filenaming conventions certain characteristics have to be followed, such as:

The prefix must be valid (look also for uppercase vs. lowercase)
The parts (one specimen divided onto several sheets) must be separated by underscores (these are sometimes replaced with hyphens by mistake)
The filename must have a certain number of digits. See Herbarium specific Information
Look for fitting postfix combinations (e.g. if there is a w_188900004354_a scanned, in most cases there should also be a w_18890004354 scanned on the same day; it is rare to have the postfix _02 or higher on anything but tabulae). If any of those cases comes up, check to see whether it is a mistake or intentional.
Sometimes, spaces are inserted by error. Look if the length of the name is aligned with the other files.

If the check was done before inserting it into the archive, it can simply be renamed. Otherwise, the case should be checked back, to clear what is to be done to the wrongly named file in the archive.

Thumbnail check

Open the newly scanned folder in thumbnail view. Look to see if the principal conditions that are necessary for a valid image are met:

Does the scan show all necessary items (color target, scale)?
Does the coloring seem unnatural?
Is an image obviously cropped at the borders?

The Kew check

The final digital check is the Kew check. It is explained in further detail here. The purpose of the Kew check is to find digital errors that the scanner made. These are potentially invisible during the visual check.

Any errors that show on the thumbnail or the Kew check go back to a priority inbox. This is described in Processing of faulty images.

Visual image check

The visual image check is the last check in the digitization process. If all other checks for the specimens are made, the image that was taken is compared to the original sheet. This is to ensure, that external people opening the image get the correct visual information. Therefore, this check is only done after importing the pictures into the archive, and is made using the viewer of the database.

The check is based on a general view of the sheet (does it look the same?/ is it the same?). Pay attention to following key features:

Is any edge of the specimen cropped, is the whole sheet visible?
Are the necessary accessories (color target, scale) visible?
If there is something hidden, are there additional postfixed scans?
Is the specimen correctly labeled?
Is there a deviation in color?

If something is unclear, it should be discussed with the relevant person. If a problem is found, the specimen reenters the process via this step. Once this check is passed, the specimen is completely digitized.

Processing of faulty images

At this step, all specimens with deficient images are collected. For these, follow these steps:

Keep a log file, where the cases and the reasons are accounted for. If others search for the specimens, they can look in the log files.
Search the original file on the scanning machine. If it still exists, delete it.
Look if the image has already been inserted into the archive.
1. It has not been inserted: Reenter the specimen into the scanning process as a “new scan”, no further processing is needed
2. It has been inserted: Reinsert it into the the process in the rescan priority inbox.

Inserting rescans into the archive

If the image has already been inserted into the archive, the insert of the rescan will fail. After it has failed, you have to open the images window of the database, and go to the thread-logs. Choose the server where a thread log to the latest import is found. Open this thread log and click on the error message of the concerned rescan. Then confirm to force the import, and click on import pictures.

See further information under Forcing import

Information

Inboxes

Inboxes are the place from where the supply for digitization is fetched. In order for a specimen to be digitized, is has to enter one of these inboxes.

The digitization process is supplied with material to be scanned by potentially more than one source, and the supply is of varying urgency. For example, there might be a request for an image by another institution. In this case, this image is probably more urgent than other material that is to be scanned. Likewise, the control mechanisms might require an image to be scanned again, this again is to be processed in priority. On the other hand, the majority of the material will be provided with no specific priority.

A set of inboxes has to be defined. There is the regular inbox stack and there are priority inboxes.

Regular inbox stack

Most of the specimens will be equally important. These are filed in the regular inbox stack. By construction, this stack should have considerably more throughput than all the priority inboxes combined.

Priority inboxes

There are several, very different reasons, why a specimen is given priority over others:

The image was requested by another institution
The specimen has a faulty picture. This should be corrected as quick as possible.
Within the global workflow, special cases pop up, which should be handled individually.
A specimen has already been digitized, but the information on the specimen has changed since. In this case a new version of the image has to be taken and named accordingly.
…

In general, one can say, that if there is something special about a specimen that is to be digitized, it should be prioritised. Therefore, priority inboxes for different classes of cases have to be introduced. For example:

Inbox image requests
Inbox rescans
Inbox unique cases
Inbox new scans (of already digitized specimens)
…

These inboxes then have to be ordered according to which has the most priority. What categories the priority inboxes include, depends on the institution.

Filenaming conventions

The image file has to be saved following a strict naming convention. Otherwise the viewer will not be able to find the image corresponding to a dataset. The naming convention consists of three tokens: the prefix, the body (number) and the postfix. Every file has a prefix and body, the postfix is optional (depending on the situation).

For example the file w_0044765_01 consists of

w	0044765	01
prefix	body	postfix

The prefix, body and postfix are always seperated by underscores.

Note: Tabulae have different conventions for each part!

The prefix

The prefix usually represents the herbarium where the specimen lies.

Examples:

w for the Herbarium Vienna
w-krypt for the cryptogam collection in W.
wu for the University of Vienna
…

For tabulae, this prefix is always “tab”.

The body (number)

The body is always a number. For specimen the number consists of the acquisition number, with the following schemes:

Cases from W:

If the acquisition number starts with a year (e.g.: 1889-3454) the number is this acquisition number, with the hyphen removed, and instead the number filled with zeros until the whole body has 11 digits: 18890003454
if the acquisition number has 7 digits, this is left as it is.
This results in a body that always has either 7 or 11 digits.

For tabulae, the body is the database ID. It is found in the edit specimens-page in the top left corner.

The postfix

Postfixes are not necessary for basic scans. They are required in certain circumstances. For specimens, there are two reasons for postfixes:

Not all the information of a specimen can be made visible on one image.
1. If this is the case, first a picture of the whole specimen is taken, without added a postfix.
2. Then, hidden parts are made visible, and selections of those parts are scanned. For this, the prefix and body stay the same, but as a postfix, the additional scans are numbered with lowercase letters (i.e.: _a, _b, …etc.)
Something changed on the specimen, and a new scan is required to reflect the change (i.e. the specimens returns from a loan with annotations).
1. In this case, the specimen is scanned again, but with an added postfix.
2. Starting with the first revision scan, the postfix _01 is added, and with further new versions, they are counted in the two digit postfix (_01, _02, _03, …)

Note that these postfixes can be stacked. So, if for example, something changed on the specimen 1889-0003454 from W in a hidden area:

This area must already have been digitized as w_18890003454_a
The revision scan then would be w_18890003454_a_01
A further revision would produce w_18890003454_a_02

If there are more than one tabulae to a specimen, they are numbered using the two digit postfix (i.e. tab_000123_01, tab_000123_02, …). Should there be one of the above cases with revisions or hidden parts for tabulae, they are also numbered in the two digit postfix.

Scanning conventions

How to operate the scanning setup is dependent on how the setup actually is. These conventions however, should be followed more or less independent of the concrete setup.

Preparing the dataset

If a specimen has already been databased, always fetch the dataset when preparing the sheet. Depending on the material a quick glance or a more thorough look should be given to the correctness of the dataset. Finally, it is good practice to copy the number for the filename from the dataset, instead of taking it from the sheet (this reduces copy mistakes, but watch out for blanks). Check the dig. image checkbox of the dataset only after finishing the scan.

Preparing the sheet

The image should show as much information of the sheet as possible. For this capsules should be emptied on to the sheet, so the contents can be seen. If there is no possibility of preparing the specimen in a way that all of the information can be seen on one image (a label hides a plant, something is on the backside), multiple pictures have to be taken, so all the information is accessible (see filenaming conventions for further information). In order to facilitate extracting the information from the image, take care that the sheet is prepared in a structured way, and leave space for the color target and the scale, which are put on it on the table. For Cryptogames: if there are both vegetative and fertile parts of the specimen, make sure that they are visible.

Working on the scanning table

The time the specimen lies on the scanning table should be minimized. A color target and a scale should always be added on the specimen for reference. The color target should be placed along the left edge of the sheet, preferably in the lower corner, with the white square pointing to the lower right. The scale should be placed preferably on the right or left edge, in the middle. If there isn't enough space on the sheet place the target and/or scale next to it. After finishing the scan, mark the specimen and its folder as scanned (if necessary).

Kew check

Digital errors are barely visible to the naked eye, and are expected to be overlooked. For this reason, a script was devised at Kew to make these errors visible. This script is to be run as standard procedure on all files. This control mechanism is called the Kew check. This check generates distorted files of the original, with a hugely boosted contrast, where digital errors are easily seen.

Running the script

Open Photoshop and go to Automation → Batch execution. There, select the right script, the newly scanned files, and a valid target folder. Now just run the script, and wait until it has generated all the control images.

Checking the images

Use an Image viewer (e.g. IrfanView) to look at each of the control images separately:

Resize the image
Scroll up and down

Take a look at the original file for everything that looks suspicious. Since the control images are very distorted, this will be a lot in the beginning. But with a little practice, you get a feeling what is normal.

Once you have a feeling for how a good picture looks in the Kew control image, look for things that don't belong there:

Vertical bright lines (usually very thin, indicator of a dead pixel on the scanning row)
Horizontal bright lines (some are very thin, but they appear in every thickness)
A flaming pattern (an indicator for problems in the coloring of the image)
anything else that looks suspicious

If you have found something suspicious, check back with the original file whether this is truly an error, or you were mistaken. For every image that has an error, get the specimen, delete the picture, and reinsert it into the scanning process (for example into a rescan priority inbox).

Tabulae

Any sheet of paper - including photos - that is not fixed to the specimen, is called a tabula here. If there is one or more of such tabulae to a specimen, it has to be digitized on its own. Most important, is the unique naming convention for tabulae:

A tabula always has the prefix tab.
The body of a tabula is the specimen ID, not the acquisition number.
Only the double-digit postfix is used (_01, …). Multiple tabulae to a specimen, new versions and hidden parts (backside) are all treated equally by numbering them in the postfix.

Tabulae need neither color targets or scales.

Institution specific Information

Herbarium W

This is a work in progress…

W: Priority Inboxes

The priority Inboxes in W are as follows(with priority):

Unique cases (very rare)
Loan/Image requests
Rescans
Scans before/after seperation
New scans of changed material
Regular inbox stack

W: Filenaming conventions

The prefix for W is w. For cryptogams, it is w-krypt.
The body has either 7 or 11 digits. If there is a year in the acquisition number, the scheme is YYYYNNNNNNN where YYYY is the year, and NNNNNNN is the acquisition number, with added zeros so the whole number always has 11 digits. If there is no year in the number, it is just NNNNNNN the acquisition number, with zeros added until it has 7 digits.
The folder structure follows a year - day - scheme:
images\pcXX\specimens\originale\YYYY\YYMMDD (XX can be 01 or 02)

W: Mamiya-Leaf

W: Mamiya-Leaf hardware setup & working environment

¹⁾

Note: Unlike the Pentacon Setup, the iXR/Credo setup does not need any special light conditions. There is no need to switch off any light or warm up the flashes.

Note: Please sign in at the Google Calendar for scanning and fix the time you wish to work on the scanners. There is a very tight schedule for scanning.

!! important: please do NOT personalize any settings of the software !!

W: Mamiya-Leaf day to day workflow roundup

Note: Please read the in-depth startup sequence guide if you are not familiar with the system!

start computer and camera (see computer-startup and camera-startup)
- the latest session is loaded.
create new session. Session naming conventions are [Date]_01 for morning and [Date]_02 for afternoon, e.g. YYMMDD_01.
- the latest session is automatically closed when a new session is started.
set “Arbeitsfläche Stand_1” (select from Menu: “Fenster / Arbeitsfläche / Stand_1”)
delete [KameraZähler] from the “Format” panel
check depth of focus:
- set aperture to 4
- take a picture of the focus wedge
- reduce illumination
- 3,5 marker on the focus wedge must be sharp
- set illumination back to 0 and aperture back to 22
create LCC-Profile (naming convention: YYMMDD, see LCC-profile) - once per day!
set white fader - once per day!
take pictures
end the session:
- check whether all names fit the file naming convention
- select all pictures of the session in the CaptureOne software
- set “LCC” to the day's preset and “Schärfe” and “Klarheit” to the “Stand_01” default. Be aware that if many pictures were taken, the LCC preset and the defaults may take a few seconds to highlight in orange.
- develop pictures as desired (siehe developing)

W: Mamiya-Leaf startup seqeuence

Note: Ensure that the camera power supply is turned off before proceding!

1) Computer startup:

Start computer
Wait until startup and login are completed.
Check if Credo80 databack is visible in systemtray.
- if not check the camera status light,
  - if this is off, turn on databack by clicking the switch-on button.
  - if status light is still off, check the battery and replace it if necessary!

2) Camera startup:

Switch on power supply. (This releases one flash and) the camera shutter can be heard.
Wait ten seconds until the system is completely started
Start CaptureOne software
Databack quits with a beep and the camera is available from the software (status light changes from blinking to permanent)
(Activate the camera via software to produce a testshot/picture)

3) Camera activation:

Open the CaptureOne program
Check if the camera is active:
- If the camera is attached and recognized correctly you'll hear a short beep.
- In the Windows command line the symbol for connected USB device is on and shows “Leaf Credo 80” as connected.
- In the CaptureOne program click the camera symbol and check the state of connection.

If the camera is inactive:
- Check the USB cable connection.
- Check the battery of the camera.
- Check if the camera is in standby mode:
  - Release camera manually to wake up from standby mode.
  - If this is without result, turn back on digital release. The connection should now be established but no release of flashlights possible (the green status light on the camera body is blinking).
  - Switch of power line for 20 seconds. After turning on the power again you should see a flash and the green status light on the camera should be on (not blinking). Release from the CaptureOne software should now be possible.

4) Depth of focus check:

Note: please do this only if the picture is not sharp!

set up the focus wedge
set aperture to 4
turn on live feed (camera symbol)
turn the focussing ring of the camera lens till the 3,5 marking on the focus wedge is sharp and clear. Be aware of the short delay between live feed and actual movement.
set aperture back to 22
take a picture and check the result

6) LCC profile

Note: this step has to be taken every day!
Note: this has to be done BEFORE any pictures are taken and WITHOUT any specimens on the table.

set flash to 10 (on both devices)
Opalscheibe vor Objektiv halten und auslösen
select from Menu “Objektiv” - “LCC” - “LCC erstellen” - beide Punkte anklicken - “Erstelle” - warten bis fertig ist
mit Pipettentool Weißabgleich setzen, dazu in das Bild klicken
LCC-Profil speichern - im Menü Voreinstellungen “Benutzereinstellungen sichern…” (Format YYMMDD)
Blitz zurück auf 8,7 stellen
ein Photo zur Blitzentladung machen

7) Taking pictures

search for the specimen in the database and copy the specimen number
insert the number into the “Name”-field in the camera program (filenaming conventions!)
prepare the specimen for scanning (place it on the table and add the targets)
press the camera-symbol in the programm to release the camera and flashes
wait for the picture to check if everything is ok
mark the specimen as processed and put it away

8) Entwickeln

Note: Aktivierung von Stapelverarbeitung (links oben)!

Menüpunkt “Ausgabe” (Zahnradsymbol links)
- Verarbeitungsvorgaben: Format “TIFF” “8bit”
- Ausgabeort: “Output”
Klick auf “Verarbeiten”
Warten (bis zu 10sek pro Bild?) - Zahnradsymbole auf Bildern ändern Farbe von orange nach grau

9) Screen calibration:

Note: this step is taken automatically every second week on Monday.

Manual calibration (if necessary):

Open “ColorNavigator 6” on the desktop.
Select “100cd 5800K 2,20”, which is the top entry of “Targets”.
Click on “Adjust”.
Check Measurement Device “CG277 Built-in(21895034)” - this is the name of the screen.
Click on “Next”.
Click on “Proceed”. The calibration process starts releasing the sensor in the lower left section of the monitor. The whole process lasts about 3-4 minutes.
Wait until the sensor is back in its cover and the software switches back to target selection page.
The lower right buttons turns to “Finish”. Click it.
The software switches back to start screen, the lower right buttons turns to “Quit”.

screen shots

10) Annotations and troubleshooting

check der Grundeinstellungen
- Blende 22
- Belichtung 1/125
- Blitz 8,7
- Proof Profil “AdobeRGB (1998)”
check Schärfung - Voreinstellung “Stand_1”
- Stärke 295
- Radius 0,9
- Schwellenwert 1,0
check Klarheit - Voreinstellung “Stand_1”
- Methode “Durchschlag”
- Klarheit -7
- Struktur 36
Beleg mit mehreren Datenbankeinträgen (A,B,C,…) kann in Explorer kopiert und umbenannt werden - Achtung es wird Originalbild kopiert (Geraderichten und Ausschneiden fehlt!) - Kontrolle in Software ob Name übernommen! (kann aber auch einfach sooft photographiert werden wie notwendig und dann den Namen anpassen)
Positionierung großer Belege funktioniert gut über Livebild-Einstellung (auf Blende 4 gehen - vor Bildaufnahme wieder zurück auf 22 stellen!!!)
Meldung über zu wenig Speicherplatz: bitte Festplatte freiräumen (lassen) - wenn Probleme beim Photographieren auftreten → Programm schließen und erneut öffnen (alte Sitzung wird wieder geöffnet) und dann normal weiter Photos machen

W: Mamiya-Leaf shutdown seqeuence

close software - databack quits with a beep.
shut down the computer.
shut down power supply.
go home.

W: Pentacon Scanner

W: Setup the cameras

Information can be found on the 'cheat sheet' in the scanning room.

W: Setup calibration

Pick the correct calibration target.
- The thin target (R080213) for PC02 (pentacon 6000)
- The thick target (R040507) for PC01 (pentacon 5000)
Place the calibration target in the middle of the scanning table (arrows indicate the best place).
Do a prescan of the calibration target.
Click on the calibration button.

You will see a mask with the structure of the target. Pull this mask in such a way, that it covers the pattern of the calibration target. (An image explaining this appears when you click the calibration button)
Once the mask is set, click on Start. The program will perform the calibration.
If the calibration was successful, you will be prompted to save the result. If not, you will be asked to readjust the mask.

W: Setup scan

W: Day-to-day-preperation

Turn on all four scanning lamps, and not the room lighting.
Even if you only use one scanner, all lamps should be turned on, so the scans have a constant brightness.
Wait for about 10 minutes. Do a couple of prescans to warm up the camera.
Create a new folder on the computer for the scanning day. (See filenaming conventions)
Before the first scan, calibrate the scanner.
The first scan of the day is the grey-card with the additional target.
Start the regular scans.

W: Ending a scanning day

Check the Thumbnail view of the day. (Does the coloring look natural, are scale and color target on every scan?)
Are the filenames according to the filenaming conventions?
Turn off all lamps, computers, and everything else (radio, fan, …)

W: Scanner troubleshooting

If you have pressed a wrong button, you can undo this using the following keyboard shortcut:
While holding down Strg press the Z-key once. (Ctrl-z on an english keyboard)
- Alternatively, if you have pressed BILDAUTOMATIK: You can go to Histogram, press Reset+OK, and then to Gradiationskurve and press Reset+OK
If a program is not responding, give it enough time (5-15min) to recover before taking other measures.
- If it is still not responding, ask someone experienced.
If the scanner is not found, quit the scanning software. Unplug and replug the scanner, and restart the software.
- If it still does not respond, reboot the computer.

W: Synchronization

Use the FileSync program to synchronize only the scan days not processed yet.
The target folder structure should mirror the folder structure of the scanning PC, but this should be set by default, there is usually no reason to change this.

W: Log files

There is a log sheet beside each scanner. For each scanner, write down:

the date you start scanning
if you have performed a calibration
if the Kew script has been run
if the Kew check has been made
if the synchronization has been made.

A log file is to be found on google drive for the rescans. Fill in:

The name and the number
In which stage of rescanning it is found
why it is to be rescanned

W-Krypt: Saving convention

Images are copied to an internal server at the end of a scanning day.

¹⁾ This is work in progess

JACQ Documentation

User Tools

Site Tools

Table of Contents