Automated collection of single species of microfossils using a deep learning – micromanipulator system

For geochemical analysis such as stable isotope ratio, radiocarbon dating and minor element analysis for a single species of microfossils, a large number of specimens, is required. Collecting specimens one by one under a microscope requires enormous time and effort. In this study, we developed a device that automates these efforts and can be used without expert knowledge. Microfossils can be accurately classified and identified to taxonomic species level using deep learning, which is one of the learning methods of artificial intelligence (AI), and picked up using a micromanipulator installed in the microscope with an automated motorized X-Y stage. A prototype of the classification model AI-PIC_20181024 showed the ability to classify microfossil species Cycladophora davisiana and Actinomma boreale (radiolarians) with accuracy exceeding 90% at a confidence level > 0.90. Using this method, it is possible to collect a large number of particles with speed and accuracy that cannot be achieved by a human technician. Although this technology can only be used for specific species of microfossils, it greatly reduces the hand work of picking and also enables chemical analysis, such as isotope ratio and minor element analysis, for small microfossil species for which it had been difficult to collect enough specimens. In addition to microfossils, this technology can be applied to other particles, with applications expected in various fields, such as medical, food, horticulture, and materials.


Introduction
Stable isotope ratio, radiocarbon dating, and minor elemental analyses of a single species of foraminifera, which is a group of microfossils, require a large number of specimens. Current sampling practice is manually collecting specimens one by one under a microscope by an expert technician. Recent improvements in analytical technologies have made it possible to carry out various analyses on extremely small samples, but it is still difficult to secure sufficient quantities of sample for analysis. For example, Ijiri et al. (2014) developed an inductive high-temperature carbon reduction method that allows measuring the oxygen isotope ratio of sub-milligram quantities of biogenic opal. As a result, it has become possible to analyze a single species of radiolarian Spongotrochus glacialis with a relatively large body. However, it is still necessary to collect more than 1,000 specimens for the majority of species of radiolarians (marine zooplankton with opal skeleton) that have body size smaller than 100 μm (< 0.1 μg/individual). In practice, it is difficult to collect enough individuals of the small radiolarian species for analysis, and there is expected to be interest in this never before reported data. However, before such research can be undertaken, it is necessary to develop a method for accurately classifying microfossils of arbitrary species and to collect them in large quantities.
In practical terms, one solution is to use a micro particle accumulator with a micromanipulator along with the latest artificial intelligence (AI) technology with utilization of big data and a high-performance computer. The micro particle accumulator can distinguish mineral particles by conventional machine learning (Isozaki et al., 2018), but it has not yet been possible to accurately classify microfossils of delicate and complicated forms. On the other hand, AI using deep learning methods has been used for distinguishing objects appearing in images. Unlike traditional machine learning where a person extracts features, features are automatically extracted by the computer, making it is suitable for classification of complicated forms like microfossils. Recently, deep learning has been used to classify volcanic ash particles (Shoji et al., 2018) and foraminiferal tests (Mitra et al., 2019). However, in these previous methods, a large number of microscopic images first needed to be collected.
Therefore, we have developed a system that enables accumulating a massive number of digital images of microfossils identified to the species level by integration of existing technologies such as AI, digital image processing, and precise micromanipulation (AIST Press Release, 2018). In this paper, we present an overview of this system and the results of a practical test with radiolarian fossils.

System overview
The system was basically designed to allow a series of operations from classification to collection of microfossils using three units ( Fig. 1): (1) Image Collection Unit, (2) Classification Unit, and (3) Particle Collection Unit. These units are connected to each other with a newly developed communication program. Each unit of this system is described below.

Image Collection Unit
This unit automatically acquires microscopic images of a large number of particles scattered on an observation area using an electric X-Y stage microscope with computer control. In this study, "Collection Pro" from Micro Support Co., Ltd., which is documented in Isozaki et al. (2018), was used as a base unit. This unit allows for scanning of particles scattered on the sample tray using a microscope with a high-resolution CCD (charge coupled device) camera in an arbitrary area, and acquires individual images clipped in arbitrary sizes and resolutions by image processing (Supplementary Movie 1). Particle sizes and their coordinates on the sample tray are recorded by the computer.
When a single particle is taken alone in the clipped image, it is recognized as one individual. However, if a plurality of particles is overlapping, they are erroneously recognized as one individual by image processing. To minimize overlapping particles, a total of 60,000 dimples with diameters of 90 and 140 μm were drilled on the tray surface ( Supplementary Figure 1), and a structure was constructed such that one individual particle fits in each indentation. As a result, particles can be efficiently dispersed to some extent and overlapping can be minimized.

Classification Unit
The Classification Unit consists of a computer operating deep learning software and a program that facilitates data exchange with the Image Collection Unit. This unit has two roles: building a classification model based on a large number of images obtained from the Image Collection Unit, and using the model to identify particles extracted by the Image Collection Unit during practical classification work. Deep learning was used to classify microfossil species from images. One of the machine learning methods is based on a multilayered neural network, which is a computational model inspired by biological neural networks, that can automatically learn features from images. The deep learning software "RAPID machine learning" (NEC Corp.), which incorporates a convolution neural network (CNN) and can be tuned to quickly construct a classification model from a large amount of data without using a graphics processing unit (GPU), was adopted in this system.

Particle Collection Unit
The target particles determined by the classification results are automatically picked up from the recorded coordinates on the sample tray by the micromanipulator of this unit. A vacuum suction type micromanipulator installed on the Image Collection Unit grips a particle at the tip of a glass tube with a diameter of 30-50 μm (Fig. 2b).
Then, the particle is carried to the collection location and released by stopping the vacuum.
A static-free field around the picking unit is generated by a static electricity remover in order to avoid adsorption of particles so that particles can be deposited. However, when a captured specimen is placed on the sample tray by the micromanipulator, it sometimes does not separate from the nozzle tip due to static electricity. Therefore, an adhesive sheet was placed in the collection area to help separate particles from the nozzle. The microfossils can be removed from the sheet with some ethanol.

Methods/Experimental
In this experiment, siliceous fossil-rich Pleistocene-Holocene sediments collected from the Southern Ocean and the Japan Sea were used. The samples were reacted with 10% HCl and 10% H2O2 solutions in order to exclude carbonate and organic matter, respectively, and then were sieved using 63 μm stainless mesh. Clastic mineral particles were extracted from residues on the 63 μm mesh by the method of Itaki (2006) in order to concentrate siliceous microfossils as much as possible. Dried particles ranging from 63 to 250 μm were mainly radiolarians and clastic minerals and were distributed homogeneously on the aluminum sample tray. Although there were many radiolarian fossils in the particles larger than 250 μm, the target species for classification and collection in this study were mainly in the 63 to 250 μm size fraction. Digital images of particles on the sample tray were taken with 280 × 280 dpi resolution under the epiillumination mode of the Image Collection Unit. A large number of these images were used as the training data for construction of the classification model using deep learning by the Classification Unit. Generally, the training data should be composed of more than 1,000 images in each category, including target species for picking and other kinds of particles. Usually, collecting such a training dataset by human technicians requires a lot of labor and time; however, this system can automate this step. If a sufficient number of images cannot be collected for a category of rare species, the training data are built out with rotations and flips of a single image (Fig. 3).
The classification model based on the training dataset is created using deep learning software in the Classification Unit. Learning repetition is basically 30 epochs (Supplementary Figure 2). The accuracy of the created model is tested using images that were not used for the model construction. Although the initial accuracy of the newly constructed model is usually low, through repetition and revision of the training dataset, higher accuracy can be achieved.

Results and discussion
Operation processes of the system including model construction, classification, picking, and collection were checked with radiolarian fossils in deep-sea sediment. Radiolarian fossils with complicated and delicate siliceous skeleton are suitable as a test material for this system owing to the difficulties of species identification and picking their small skeletons. Test results from the process are noted as follows.

Training data and model construction
In classification by deep learning, all collected images were applied to one of the previously learned categories. The confidence level for each category of the image was estimated, and the category with the highest value was shown as the classification result. In this system, since all the images acquired by the Image Collection Unit were classified in one of several categories, it is necessary to efficiently construct a classification model in order to distinguish the target species from various particles. The sample particles used in this experiment consisted mainly of radiolarians and clastic particles. Although the sample included multiple radiolarian species, we decided to build a learning model with Cycladophora davisiana and Actinmma boreale as the target species (Fig. 3).
In the initial constructed model AI-PIC_20181024, approximately 20,000 images of particles were selected from more than 30,000 images collected by the system and were categorized into the following 8 categories: C. ]. Although C. bicornis was not a target species, it was separately categorized for differentiation from C. davisiana due to having similar morphology. Antarctissa spp. and L. pylomaticus were also categorized separately in this model; however, results for these categories were not evaluated in this study due to difficulty in discrimination because of unreliable images from the training data.
Practical tests based on model AI-PIC_20181024 were performed using three samples from the Japan Sea (IODP site U1422C 1H-1, 68-70 cm) and the Del Canõ Rise in the Southern Ocean (45.7°S, 44.4°E, 2445 m in water depth, Piston core site DCR1PC, #39). Confidence level (0.00-1.00) meaning certainty of correct classification, total number of images used in classification through the scan, accuracy (number of correct classifications/number of categorized images) for C. davisiana and A. boreale, and number of uncategorized images with percentages are shown in Table 1.
In this practical test, the confidence level was set to 0.60 for U1422, and 0.80 and 0.90 for DCR1PC. When the confidence level was 0.60, uncategorized images accounted for 11% of 2,472 images, while when it was 0.90, about 60% of all particle images were considered as uncategorized images and were excluded from classification. The number of uncategorized images tends to be higher with increasing confidence level.
The accuracy for C. davisiana for the same DCR1PC sample was 70% and 92% at confidence levels of 0.80 and 0.90, respectively. Thus, accuracy increased with confidence level but dropped with the number of images categorized for this species. On the other hand, the accuracy for both C. davisiana and A. boreale from sample U1422C was 94% at a confidence level 0.60. Such high accuracy for both of species despite the relatively low confidence level of 0.60 was possibly related to the reduction in misclassification due to the low diversity of radiolarian assemblages in this sample.
As demonstrated above, setting the confidence level high results in an increased classification accuracy, while many images are excluded due to being below the confidence level. An ideal model is one that realizes both the high accuracy and reduces uncategorized images, as is shown for sample U1422C with confidence level 0.60. Although model AI-PIC_20181024 was constructed based on training data from both the Japan Sea and the Southern Ocean, higher accuracy may be achieved by building models independently for each region or timeframe using local samples for training data.
In addition to radiolarians, classification models for other microfossil groups such as foraminifers (marine zooplankton with CaCO 3 skeleton) and diatoms (phytoplankton with opal skeletons) have been attempted using this system. Although radiolarian species can usually be classified by a single view from one direction, foraminifers should be observed from various directions in order to recognize the 3-dimensional characteristics of the skeleton. This means that a mechanism for distinguishing foraminiferal species from multi-directional images is required. In the case of diatoms, digital images using a higher-power microscopic lens using composite focusing image programs are needed to capture their smaller skeletons. To resolve these issues for various microfossil groups, further development of this system is expected as a next step.

Particle collection
Using the model AI-PIC_20181024, C. davisiana was selected by the Classification Unit from the residue of the test sample spread on the sample tray by the Image Collection Unit. The classified particle images were displayed on the operation monitor of the Image Collection Unit for checking, and the misclassified particles were excluded from the target by a human technician in order to avoid including wrongly identified particles. Correctly classified images can be organized and stored as additional training data, which can later be used to construct a more accurate classification model. After the classification was completed, all identified C. davisiana specimens were sequentially picked up from their recorded coordinates on the tray by the micromanipulator (Supplementary Movie 2). In advance of collecting the particles, it is necessary to accurately set the position information of the suction nozzle tip of the micromanipulator, the approach direction to the particles, and the collection area of the picked up particles. The accuracy of this setting is important for reducing pickup failure. In this experiment, 70-80% of the identified particles could be collected successfully. Failure seemed to occur when there was a small distance between the tip of the nozzle and the target. Improving collection efficiency is an improvement point for the system.
In this practical test, using samples that contained sufficient numbers of the target species, the picking speed was about 120 specimens per hour. If the sample contains sufficient numbers of the target species, the target number of specimens for analysis may be reached with a single sample, but it may be necessary to repeat the process with additional samples if the number collected is not sufficient.

Conclusions
An automated microfossil pick-up system with implementation of AI technology was newly developed, and results of a practical test of this system confirm the practical use of a classification model with sufficiently high accuracy. Using this automated system, microfossils could be collected on species level, whereas previously, a huge amount of time and labor was required to collect the samples by hand. The model AI-PIC_20181024 used in this experiment can be used to classify two radiolarian species, C. davisiana and A. boreale, with accuracies of more than 90% at a confidence level of 0.90, but almost half of the images remain uncategorized. The number of uncategorized images decreased at lower confidence levels (0.60 and 0.80); however, the accuracy also tended to deteriorate at these confidence levels. In order to collect target microfossils more efficiently, it is important to construct an excellent classification model that has high accuracy even at low confidence levels. Because the system and classification model reported in this paper are still in the prototype stage of development, both the device and model are improved for the practical uses.
In addition to applications to microfossils, this system can be applied to the classification and extraction of mineral particles. This system can be adapted to various specialized sorting needs, such as preparation of highpurity materials in the steel industry, sorting of plant seeds and fish eggs, removal of impurities and contaminants in foods, and sorting of abnormal cells, embryos, or platelets. In order to achieve workable systems in these fields, further development is needed to link AI and microscope systems and to develop a separator suitable for each type of particle.