WO2025076539A1

WO2025076539A1 - Neural network-enabled system for estimating age-at-death using radiographic images

Info

Publication number: WO2025076539A1
Application number: PCT/US2024/050257
Authority: WO
Inventors: Katherine D. VAN SCHAIK; Moustafa ABDALLA
Original assignee: Vanderbilt University
Current assignee: Vanderbilt University
Priority date: 2023-10-05
Filing date: 2024-10-07
Publication date: 2025-04-10
Anticipated expiration: 2026-04-05

Abstract

The disclosed system predicts the biological age of skeletal remains using a convolutional neural network (trained, for example, using 693 radiographs from 136 adults interred in lead coffins in the eighteenth and nineteenth centuries in the crypt of London's St. Bride's Church). Additionally, to increase explainability and minimize the risk of overfitting, the disclosed system uses backpropagation to generate heatmaps that localize relevant regions of the skeletal remains that are class-discriminative (i.e., most important to the model in predicting the age-at-death).

Description

NEURAL NETWORK-ENABLED SYSTEM FOR ESTIMATING AGE-AT-DEATH USING RADIOGRAPHIC IMAGES

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Prov. Pate. Appl. No. 63/588,131, filed October 5, 2023, which is hereby incorporated by reference.

FEDERAL FUNDING

[0002] None

BACKGROUND

[0003] Assessment of biological age-at-death remains a challenging yet integral dimension of bioarchaeological studies. Fundamentally, bioarchaeologists “want to estimate... the probability that the person died at a certain age given that [the remains have] one or more skeletal characteristics...To do that we need an appropriate reference sample where age-at-death is accurately reported, the sample approximates the population of interest, and a sufficient number of individuals are distributed throughout all of adulthood. Those requirements are hard to meet” (Milner et al., 2019, p.609). As has been extensively described in existing literature, different methods of assessing biological age-at-death all have their own challenges, and the origins of those challenges are multi-factorial. The nature of the reference sample in which those skeletal characteristics are identified matters; for example, in a sample that consists primarily of younger people, a trait typically associated with older individuals would show a relatively increased frequency in younger people, only because the sample itself had more younger people in it. Furthermore, the “skeletal characteristics” that bioarcheologists evaluate are fundamentally affected by heterogeneity in frailty (DeWitte & Stojanowski, 2015; Wood et al., 1992).

[0004] Transition analysis, which seeks to obtain estimates of a trait’s presence within a population independently of the age distribution of the reference sample, has offered some solutions to these problems and represents an improvement upon some methodologies. However, Milner and Boldsen’s transition analysis estimates still bear witness to the difficulties of identifying age-at-death for individuals who were older than age 60 when they died (G. R. Milner & Boldsen, 2012).

[0005] Quantifying the relationship between heterogeneity in frailty and skeletal aging, Mays’ 2015 survey article argues that approximately 60% of anatomical variation in skeletal age estimates is not caused by age (Mays, 2015). However, even by Mays’ daunting estimates, 40% of the anatomical variation in skeletal estimates could be caused by age. Characteristics that have been traditionally used to assess age include features of the auricular surface and the pubic symphysis (Brooks & Suchey, 1990; Lovejoy, Meindl, Mensforth, et al., 1985; Lovejoy, Meindl, Pryzbeck, et al., 1985; G. R. Milner & Boldsen, 2012). Other characteristics, too, are known to correlate broadly with aging, including osteoarthritis and osteophyte formation (Shane Anderson & Loeser, 2010). Reliance on these features of bones in archaeological contexts can be problematic, however, as articular surfaces — and the pelvis — are irregularly shaped and less likely to survive burial intact, compared with the more regularly shaped diaphyseal section of long bones, which also has a thicker cortex. Not infrequently, only the diaphyseal long bones will remain intact in burial environments, rendering detailed assessments of age impossible because the sections of bone required to make such assessments are not preserved.

[0006] Changes in bone quality, including osteoporotic change (Agarwal, 2018, 2021), can be used to assess age-at-death, although bone quality can be greatly affected by burial environments. An interdisciplinary focus on assessment of the aging human skeleton, especially in bioarchaeological contexts, is the most promising way forward, as was highlighted in a panel at the 2022 annual North American meeting of the Paleopathology Association. New methods have increasingly been employed to characterize and to quantify metrics associated with aging in the human skeleton, including epigenetic analysis (Lee et al., 2020) and assessment by radiological techniques (Van Schaik et aL, 2018, 2019).

]0007] Convolutional neural networks are a class of artificial neural network that are often applied to analyze visual imagery (Lecun & Bengio, 1995). The utility of all neural networks, including convolutional neural networks, is their ability to approximate any function mapping an input and an output (e.g., predicting sex from a radiograph where the input is the image and the output is the sex) through a series of complex computations (Funahashi, 1989). Like other neural networks, convolutional neural networks consist of an input layer (that is used to feed in the input image), hidden layers (that perform computations that approximate the desired function), and an output layer (that will classify the image into the desired output classes). Convolutional models are unique from other neural network architecture in that they apply their computations in a way that successfully captures spatial dependencies in an image (called convolutions or kernels) while minimizing the number parameters required for the model (Lecun & Bengio, 1995). These convolution kernels ‘slide’ across the image to generate feature maps (i.e., ‘processed images’) that are fed into subsequent layers; neural networks are often multiple layers deep. By applying a series of convolutions, as well as other layers (e.g., pooling layers, fully connected layers, and normalization layers), the model can extract the high-level features (e.g., edges, outlines) that can be used for prediction.

[0008] Deep learning techniques, including convolutional neural networks, have been very successful in extracting biological age from medical data (Pyrkov et aL, 2018),. For example, deep neural networks have successfully predicted sex and chronological age (to within 2.1 years) on the basis of healthy adult chest radiographs (Yang et al., 2021). The anatomical regions most important for the age prediction model were the spine, ribs, aortic arch, heart, and soft tissue of the thorax. Although Yang et al.’s model incorporated elements of soft tissues which generally would not be available in an archaeological context, other convolutional neural network models applied to radiographs have been able to focus more exclusively on bone in order to predict age, for example, using knee radiographs for pediatric age estimation (Demircioglu et al., 2022).

[0009] Of relevance for the assessment of age-at-death in bioarchaeological contexts, machine learning models have been productively applied, with diverse imaging modalities that include conventional radiography, computed tomography (CT), and magnetic resonance imaging (MRI).

[0010] However, existing techniques rely on extensive physiological and biomarker data that are mostly not available for the skeletonized individuals excavated at archaeological sites. For instance, existing models that are applied in clinical contexts use images that include soft tissue to make age assessments of living patients. As there is effectively no soft tissue available for analysis in most archaeological contexts, these dimensions of ML-based age assessment are not applicable in for bioarchaeological skeletal remains. Additionally, clinically-acquired radiographs are obtained in a standardized way (e.g., with bones in similar positions). The relative uniformity of those images facilitates identification of the subtle differences that enable ML -based age assessment. However, such standardization across images is often not possible for archaeological remains, as excavated bones are in highly variable states of decay and incompleteness. Finally, in addition to the soft tissue described above, existing models use images of the entire bone while bones in archaeological contexts are almost always incomplete to some degree and are often missing the epiphyseal and metaphyseal components that are especially useful for the detection of osteoarthritic changes that existing clinical models use to assist in age prediction.

[0011] Therefore, even though machine learning models have been used in bioarchaeology in the assessment and characterization of osteoporosis, bone quality, and fracture risk (Ferizi et al., 2019), for sex identification (Miholca et al., 2016) and to predict stature from archaeological skeletal remains (Czibula et aL, 2016), there have been almost no studies using the tools of artificial intelligence to assess human remains from archaeological contexts.

SUMMARY

]0012] The disclosed system predicts the biological age of skeletal remains using a convolutional neural network (trained, for example, using 693 radiographs from 136 adults interred in lead coffins in the eighteenth and nineteenth centuries in the crypt of London’s St. Bride’s Church). Additionally, to increase explainability and minimize the risk of overfitting, the disclosed system uses backpropagation to generate heatmaps that localize relevant regions of the skeletal remains that are class-discriminative (i.e. , most important to the model in predicting the age-at-death).

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] Aspects of exemplary embodiments may be better understood with reference to the accompanying drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of exemplary embodiments.

[0014] FIG. 1 A is a block diagram of a neural network-enabled system for estimating age-at- death using radiographic images according to exemplary embodiments.

[0015] FIG. IB is a diagram of the example system of FIG. 1A in greater detail. |0016] FIG. 2 is a diagram of an architecture of the system of FIG. 1 according to exemplary embodiments.

[0017] FIGS. 3A through 3E are heatmaps generated by the disclosed system using radiographic images of bones of a female individual, age-at-death of 63, including a humeri (FIG. 3A), a pelvis (FIG. 3B), a tibiae (FIG. 3C), a right femur (FIG. 3D), and a left femur (FIG. 3E).

[0018] FIGS. 4A through 4E are heatmaps generated by the disclosed system using radiographic images of bones of a male individual, age-at-death of 75, including a humeri (FIG. 4A), a pelvis (FIG. 4B), a tibiae (FIG. 4C), a right femur (FIG. 4D), and a left femur (FIG. 4E).

[0019] FIGS. 5A and 5B are heatmaps generated by the disclosed system using radiographic images of bones of a female individual, age-at-death of 54, including a right and left humerus (FIG. 5A) and a pelvis (FIG. 5B).

[0020] FIGS. 6A through 6D are heatmaps generated by the disclosed system using radiographic images of bones of a male individual, age-at-death of 34, including a pelvis (FIG. 6A), a right femur (FIG. 6B), a left femur (FIG. 6C), and a tibia (FIG. 6D).

DETAILED DESCRIPTION

[0021] FIG. 1A is a block diagram of a neural network-enabled system 100 for estimating age-at-death 150 using radiographic images 110 according to exemplary embodiments.

[0022] In the embodiment of FIG. 1A, the system 100 includes a neural network 140 trained using training data 180, a preprocessing module 120, and a heatmap generation module 170. The neural network 140 may be, for example, a convolutional neural network 140 constructed using Tensorflow v2.8. The neural network 140 takes in a radiographic image 110 of any bone as input and generates a predicted age 150 (e.g., an age range in ten-year increments) at which the individual died. Each input radiographic image 110 may be a 2D matrix with 3 channels encoding the color of the image.

[0023] In the embodiment of FIG. 1A, the neural network 140 includes an input layer 141, a series of 2D convolutional layers 142 and 2D max pooling layers 143, a flatten layer 145, a dense fully connected layer 146, a dropout layer 147, and an output layer 148. |0024] The input 141 to the first convolutional layer 142 may be a 480 x 640 x 3 image, where 480 x 640 corresponds to the standardized size of the radiographic image 110 output by the preprocessing module 120 (discussed below) and 3 denotes the number of (RGB color) channels. The first convolutional layer 142 may have 64 filters (or equivalently, kernels) of size 2 x 2 x 64, where 2 x 2 denotes the size of the filter and 64 denotes the number of channels for that filter, and a non-linear rectify activation function (e.g., an He uniform variance scaling initializer). The output of each filter may be a locally connected structure, convolved with the input radiographic image 110, to produce 64 feature maps, which may then be max pooled with the output of other filters from the convolutional layer 142 by the subsequent max pooling layer 143. These feature maps then serve as input for the subsequent layer (e.g., a subsequent convolutional layer 142 and max pooling layer 143). Embodiments may include any number of convolutional layers 142 and max pooling layers 143 (e.g., four convolutional layer 142 and four max pooling layers 143).

|0025| In the embodiment of FIG. 1A, the convolutional neural network 140 includes a flatten layer 145 that converts the multi-dimensional output from the final max pooling layer 143 into a one-dimensional vector for processing by a dense fully connected layer 146. The dense fully connected layer may include, for example, 512 nodes and an ReLu activation.

|0026] In the embodiment of FIG. 1A, the convolutional neural network 140 also includes a dropout layer 147 - a model regularizer that limits co-adaptation and improves generalizability by randomly zero-ing input values - that randomly drops units and their connections (Hinton et al., 2012). The dropout probability of the dropout layer 147 may be set to 0.5. The output layer 148 may include a number of softmax output neurons 149 (e.g., seven softmax output neurons 149) corresponding to the estimated age 150 (e.g., less than 31 years of age, 31-40, 41-50, 51-60, 61-70, 71-80, and greater than 80 years old) of the individual depicted in the input radiographic image 110.

[0027] The convolutional neural network 140 is trained using training data 180 that includes radiographic images 110 of bones of individuals having known ages 150 at death (and may also include other information, such as the sex 114 of each individual). For example, the convolutional neural network 140 may be trained using the radiographic images 110 of the individuals interred in the crypt of St. Bride’s Church described below. |0028] To enable construction of the models and ensure their generalizability, the disclosed system 100 may include a preprocessing module 120 that removes all texts/labels (e.g., left-right markers and radiograph labels) from the radiographic images 110. (Otherwise, the model may use the text on the radiographic image 110 in prediction rather than focusing on extracting features from the bones/skeletal remains.) For example, the preprocessing module 120 may include an optical character recognition module 122 (e.g., pre-trained keras-ocr models) that obtain bounding box coordinates of all text on the radiographs) and a masking module 124 that replaces the text using an inpainting algorithm (e.g., realized using Open CV) to create a text- free radiographic image 110. The preprocessing module 120 may also include a standardization module 126 that standardizes the input size of the radiographic images 110 (e.g., to 480 x 640 pixels).

[0029] The loss function used to optimize the model may be defined using categorical crossentropy, designed to quantify the difference between two probability distributions for multi-class prediction tasks. Model weights may be updated using Adam, an algorithm for gradient-based optimization of stochastic objective functions (Kingma & Ba, 2014). The model may be trained for 50 cycles, before exiting early using a validation set (e.g., 10 of the training data 180) and a patience of 3.

|0030| To identify which features the model is examining for age-at-death prediction, embodiments of the system 100 includes a heatmap generation module 160 that configured to generate saliency heatmaps 190 that localize relevant regions of the skeletal remains that are class-discriminative (i.e., that are identified by the model as being important for correctly predicting the age-at-death 1 0 of the individual). As described in detail below with reference to FIG. IB, the heatmap generation module 160 may include a gradient setting process 162, back propagation 164 through the neural network 140, a pooling process 165, a summation process 166, and a combining process 168.

[0031] FIG. IB is a diagram illustrating the exemplary system 100 of FIG. 1A in greater detail. The dropout layer 147 of FIG. 1A is omitted for clarity.

[0032] As shown in FIG. IB, each convolutional layer 142 generates feature maps k by applying an activation function and each max pooling layer 143 down-samples those features maps k (forming what is referred to in FIG. IB as down-sampled feature maps k'). The flatten layer 145 takes the two-dimensional feature maps k' from the final max pooling layer 143 and flattens them into a one-dimensional vector V_k. The fully connected layer 146 applies an activation function A^Vk to the linear combination of the vector V_k, weights W, and biases b. The output layer 148 converts the output F of the fully connected layer 148 into a probability distribution over the classes c (i.e., the predicted ages 150). For each class c, for example, a e^F softmax function may calculate the probability y^c pi where y^c is the predicted probability

for class c, F^c is the output of the fully connected layer 148 for class c, and X; e^F7 is the sum of the exponentials of the outputs for all classes (ensuring that the probabilities sum to 1). The class c is determined by selecting the class c with the highest probability c — argmax y^c . c

[0033] During training, the gradient of the loss function with respect to the output F is used

to update the network’s weights W. The gradient for class c may be represented as = y^c —

y^c where L is the loss function, y^c is the predicted probability for class c, y^c is the true label for class c (i.e., 1 if the known age 150 of an individual in the training data 180 is within class c and 0 otherwise).

[0034] In the embodiment of FIG. IB, the heatmap generation module 160 is realized using Grad-CAM visualization, where the radiographic image / (referred to as radiographic image 110 with reference to FIG. 1A) is propagated through the convolutional neural network 140 to generate the feature maps k. In the gradient setting process 162, the gradient 156 for the determined class c (i.e., the estimated age 150) is set to 1 and the gradients 156 for all other classes d are set to 0. A backpropagation process 165 is then used to focus on class c and calculate how changes in the feature maps k influence the probability y^c . Specifically, the gradient for the probability y^c for class c is calculated with respect to the forward activation maps A^k of the final convolution layer 142. In the pooling process 165, those gradients are global average pooled (or global max pooled) over the width i and height j of the input image / to obtain the neuron importance weights a_k — ~ .i Sy rr (where Z is the total number of pixels in the input image I). By computing a_k during backpropagation, the backpropagation 165 and pooling 166 processes amount to successive matrix multiplications of the weight matrices and the gradients with respect to the activation functions A of the neural network 140 all the way through to the final convolution layer 142 where the gradients are being propagated.

Accordingly, each weight a_k ^c can be understood to represent the “importance” of each feature map k for classifying the radiographic image / as belonging to the target class c.

[0035] In the summation process 166, a weighted linear combination

a_kA^k is calculated of the forward activation maps A^k (each weighted by the respective weight a_k). A rectified linear unit (ReLU) may be applied to the linear combination, so that the resulting localization map i<Grad-CAM includes only the features that have a positive influence on the class c of interest (i.e., pixels whose intensity should be increased in order to increase y^c). In the combination process 168, the localization map f Q_rad-cAM '^{s US£}d ^t0 highlight the identified locations of the radiographic image I and form the heatmap H (identified in FIG. 1A as heatmap 190). Accordingly, the heatmap generation module 160 captures the important skeletal features, which may be weighted with the forward activation maps to highlight the features of the radiographic image 110 that the model examines to make a particular age prediction 150.

[0036] Using back propagation, the heatmap generation module 160 is able to identify what the model is “looking at” when making its predictions. More formally, the heatmap generation module 160 generates heatmaps that localize relevant regions of the skeletal remains that are class-discriminative (i.e., features of the radiographic image 110 are most informative for the class prediction). In other words, the heatmaps 170 highlight the parts of the radiographic image 110 that are important for correctly predicting the age-at-death 150 of the individual. Figures 3A-3E and 4A-4E provide typical examples of heatmaps for a female and a male individual for whom all bones were available and Figures 5A-5B and 6A-6D provide examples of heatmaps for a female and a male for whom fewer bones were available. Brighter areas reflect increased “attention” by the model. In the pelvis, predictions tend to be based on the acetabular surface and the sacroiliac joints. In long bones, articular surfaces were more difficult to assess, as epiphyses were often severely damaged or absent. In these cases, the model directs its attention to the diaphyseal cortex.

[0037] Using the convolutional neural network 150 described above, the disclosed system 100 can generate a predicted age 150 (e.g., a predicted age range) for an individual using only a single radiographic image 110 of a single bone from that individual. Meanwhile, in situations where multiple radiographic images 110 (e.g., of multiple bones) of an individual are available, the disclosed system 100 may generate a predicted age 150 (e.g., a predicted age range) of that individual by identifying the mostly commonly observed age prediction 150 (i.e., the mode) across all radiographic images 110 of that individual.

[0038] FIG. 2 is a diagram of a hardware environment 200 of the disclosed system 100 according to exemplary embodiments.

[0039] As shown in FIG. 2, the system 100 may be realized as a hardware computing system 240, including non-transitory memory 248 storing instructions and at least one hardware computer processing unit 242 executing those instructions to perform the functions described herein. The computing system 240 may include any computing device capable of performing those functions (for example, a server, a personal computing device, etc.) As described above, the system 100 receives radiographic images 110 of bones of individuals and outputs an estimated age 150 of each individual. The radiographic images 110 may be received from a remote computer 210 via a network 250 (e.g., a local area network, the Internet) or via any wired or wireless communication link. The estimated age 150 and heatmaps 190 may be output, for example, via a graphical user interface. The computing system 240 may also include non- transitory computer readable storage media 280 (or communication with external storage media 280 via a wired, wireless, or network connection).

Benefits of the Disclosed System

[0040] By using an age estimation method that does not require multifactorial analysis of skeletons or various age indicators, the disclosed system 100 is capable of predicting biological age-at-death 150 using only a single radiographic image 110 of a single bone, which is especially useful for assessment of the fragmented remains that are commonly encountered in archaeological contexts. Meanwhile, in situations where multiple radiographic images 110 (e.g., of multiple bones) of an individual are available, the disclosed system 100 synthesizes those multiple radiographic images 110 in a standardized away to facilitate multifactorial analysis of the skeleton to obtain an accurate age estimation 150. Accordingly, the disclosed system 100 is capable of producing accurate and reliable estimates across the entire human life span, is simple to use and applicable to most (if not all) archaeological contexts, and is capable of significantly improving existing age-estimation methods in a non-destructive way. [0041 ] Additionally, by generating heatmaps 190 that identify the features identified and used by the model, the disclosed system 100 enables users to “understand the model” and ensure that the model is focusing on actual bone features (in particular those that reflect degenerative change and are already key components of traditional aging methods) rather than making a prediction using irrelevant features (e.g., text within the radiographic image 110).

[0042] Additionally, by using radiographic images 110 captured using consistent parameters, the disclosed system 100 uses features of bone radiographs 110 that cannot be captured using prior art methods. As shown in the heatmaps 190 generated by the system 100, the model is trained to focus on diaphyses, which provide continuous stretches of cortical surface. Cortical thickness measurements, obtained from both radiographs and CT, have also been used by prior art systems to assess age and overall health. However, cortical assessments on the basis of radiographs typically can only measure the thickness. Prior art methods cannot assess cortical density, as the disclosed model may well be doing, because the cortical density registered by digital plate radiography is dependent upon the parameters used for image acquisition, which may be adjusted by the radiographer in order to facilitate an image with greater resolution. However, in a setting in which all or nearly all of the bones were acquired with the same kVp and mAs (as was the case for the training data 180 used to train the disclosed system 100), cortical density can reliably be assessed and compared by Al. Accordingly, in its determination of age-at-death 150, the disclosed system 100 is capable of using features of a bone radiograph 110 that likely could not be captured by other methods of analysis.

[0043] The performance of the disclosed system 100 was assessed using two separate evaluation schemes: (1) predicting age-at-death 150 using all available bones of an individual; and (2) predicting age-at-death 150 using a single bone/radiograph 110. In the latter case, when evaluating performance, we used each radiograph 110 as an independent guess. Unsurprisingly, we observed that the use of all the bones improved performance by looking at the mostly commonly observed age prediction 150 (the mode) across all bones/radiographs 110 of an individual. This choice enabled multi-bone analysis in a standardized, simple way. However, the second evaluation scheme enabled us to underestimate model performance and to assume the worst-case scenario, which is frequently encountered in archaeological contexts: how would the model perform if only one bone/radiograph 110 were available? Compared to when all bones were available, estimates of age-at-death 150 were slightly worse; fewer data points lead to less reliable estimates (e.g., ‘outlier’ bones may skew the age-at-death estimate). However, importantly, the model was trained to predict the age-at-death 150 estimate using a single bone/radiograph 110 and thus the model is useful even when bones are missing. Therefore, while quality of age-at-death estimation for the individual roughly correlates with the number of bones available, the model is still useful for approximate estimates using single bone predictions.

[0044] How do we know the model is not overfitting? This is a concern, considering the good performance of the model and the “black box” nature of deep neural networks. To us, the best solution to prevent overfitting is to “understand the model”. Thus, aside from the technical features incorporated in the model to minimize overfitting (e.g., the dropout layer 147, use of a validation set for early exit during training), we sought to generate heatmaps 190 that interpret the model predictions as another means of assessing the model performance. The assumption: a robust, generalizable model will tend to focus on actual bone features, in particular those that reflect degenerative change and are already key components of traditional aging methods, as compared to overfitted models that may be predicting using irrelevant features such as text within the image. Thus, although deep learning models might be labelled as “blackbox”, the saliency heatmaps 190 provide insight into how the model makes its predictions.

[0045] The figures are representative heatmap collections for female (Figures 3A-3E and 5A-5B) and male (Figures 4A-4E and 6A-6D) individuals for whom many bones (Figures 3A-3E and 4A-4E) and few bones (Figures 5A-5B and 6A-6D) were available for analysis. As described above, heatmap analysis 160 shows that the model is indeed focusing on regions of bone that are used by these traditional aging methods, including areas where degenerative change is expected, such as the acetabulum and the sacroiliac joints. Notably, the model is also focusing on diaphyses, which provide continuous stretches of cortical surface.

[0046] This interpretation of the model’s “black box” methodology is further supported by the data that show its bone-specific ability to predict age-at-death 150. Performance by bone varied by only 6%; it is also notable that the highest-accuracy bone (right femur) and lowest- accuracy bone (left femur) are the same, except for sidedness. In other words, despite their very different shapes, all of the other bones exhibited prediction accuracies within a range framed by two bones with identical shapes (despite being mirror opposites). Such results raise the possibility that the model is not looking at the shapes of the bones themselves, or at features that are highly variable across bones, but rather at intrinsic features that are shared among all the bones that belong to a specific individual, such as density or trabecular structure.

RESULTS

[0047] The initial study population used for this analysis consisted of 227 adults interred in lead coffins in the crypt of St. Bride’s Church, London, UK, in the eighteenth and nineteenth centuries. As the coffins in which these individuals were interred contained plates which recorded each person’s name, date of birth, and date of death, this collection provides skeletal remains of known sex and age, enabling creation, training, and testing of the machine learning model described below. The skeletons in this collection are well-preserved, with 88% more than 50% complete and 73% more than 60% complete (with percent completeness referring to the presence or absence of a bone, provided than 50% or more of that bone remained). The skeletons are held in the crypt of St. Bride’s Church, under the care of the Museum of London, in a permanent repository. Age-at-death, sex, and evidence of lesions were recorded in the Wellcome Osteological Research Database (WORD) according to the Museum of London’s standardized recording protocol. In 2015, radiographic imaging was carried out on crania, humeri, pelves (with sacra, when available) femora, and tibiae using a Sedecal 4.0kW X-Ray generator and a Canon Lanmix 35cm x 43 cm flat plate digital detector. Photographs of each bone were taken alongside the radiographs. Cranial radiographs and photographs had been previously obtained in 2010-2011, using the same radiographic equipment and radiographer.

[0048| The dataset used to train the CNN of the disclosed system consisted of 693 radiographs from 136 individuals. Each individual had at least one radiograph of one of five bones: humerus, pelvis, right femur, left femur, and/or tibiae. A second data set, from the University of Coimbra, Portugal, was used to validate and refine the model initially developed from the St. Bride’s images.

[0049] On the assumption of a Gaussian distribution of proportion, a 95% confidence interval was used to evaluate the skill of the disclosed predictive model. A confidence interval describes the bounds of data sampled from the distribution and provides bounds on a population parameter (e.g., a mean, standard deviation, etc.). To evaluate the performance of the disclosed model on a dataset, we measured how well the predictions made by the model matched the observed data using leave one out cross validation, which entails measuring the mean squared error (MSE), which is calculated as:

MSE = (l/n) * Z(y_f - f( ))² where n is the total number of observations, y_t is the response value of the ith observation, and f( j) is the predicted response value of the ith observation. The closer the model predictions are to the observations, the smaller the MSE will be.

[0050] Leave one out cross validation involves splitting the dataset into the training set and testing set, using all but one observation as part of the training set. A model is built using only data form the training set, and then this model is used to predict the response value of the one observation left out of the model. The MSE is calculated. This process is repeated n times, where n is the total number of observations in the dataset, leaving out a different observation from the training set each time. Then, the test MSE is the average of all the test MSEs. Spearman’s rho was then calculated; the Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables. If there are no repeated data values, a perfect Spearman correlation of +1 or -1 occurs when each of the variables is a perfect monotone function of the other.

Model Construction and Evaluation Schema

10051] To summarize the method in general terms, in order to frame our results: The dataset was comprised of 136 individuals, for which we had both radiographs and matched age annotations. To assess performance of our model architecture, we randomly selected one individual to be withheld for testing, meaning that one individual’s radiographs were not incorporated into the model, which was trained on the remaining 135 individuals. Rather than predicting the age from all the skeletal remains of an individual, each radiograph was fed into the model separately. For example, if an individual died at the age of 50 and his or her remains consisted of a pelvis and a right femur, we created two training examples by inputting the pelvis separately from the right femur. The model would be given the radiograph of the pelvis as input and asked to predict the age (50, in this case). Then, the model would be provided with the image of the right femur and asked to predict the age again. Thus, we treated each radiograph as another independent guess. Our motivation for this choice was two-fold: (1) From a technical perspective, we sought to encourage our model to learn age predictive features that are shared across all bones, as well as those that are unique to specific bones; (2) From a pragmatic perspective, we wanted our model to be powerful and robust enough to predict age from a single radiograph, as remains from archaeological contexts are often incomplete.

[0052] The model was thus training independently on each radiograph of the 135 individuals (number of training examples = 135 x number of radiographs per person). For testing, we assessed performance on the one individual withheld for testing. Similarly to training, during testing, we fed each radiograph of the withheld individual separately into the model and assessed the prediction using several metrics, including confusion matrix generation, accuracy, and Spearman correlation. This was repeated 136 times to include all available individuals and their associated radiographs. As illustrated in Tables 1 and 2, we subsequently calculated these performance metrics overall, for two different categories, testing the ability to predict age-at- death for each bone type, and for each individual (by using all available bones for that individual). This enabled us to assess variation in model performance across key categories.

Predictive Performance at the Individual Level

[0053] Using leave one out cross-validation, the overall weighted accuracy was 94% when using all available bones/radiographs for an individual, as shown in Table 1 :

[Table 1]

In the results of Table 1, the Spearman's correlation was 0.77, the p-value was 6.07E-28, and the weighted accuracy was 0.94.

[0054] Nearly all the individuals classified incorrectly had only 1-2 bones available for assessment, instead of the five bones that were available for other individuals, and/or incomplete bones that were missing epiphyses. To assess the degree of error for those classified inaccurately, we calculated Spearman’s rho to assess how well the predicted ages correlate with the actual ranks, including those that may have been classified erroneously; rho of 0 indicates no correlation, and rho of 1 indicates perfect correlation. Using all bones for the individual, Spearman’s rho was calculated to be 0.77 (p-value = 6.1 xlO'²⁸). Thus, our model tends to overestimate the age for individuals that are classified incorrectly (Table 1). It is important to note this is the performance metrics were calculated on prediction of individuals that were withheld from the model during the entire training process (i.e., never seen by the model during the training phase at all). In other words, Model 1 trained on individuals 1-135 and was tested on individual 136. Model 2 trained on individuals 2-136 and was tested on individual 1. Model 3 trained on individuals 1 and 3-136 and was tested on individual 2, and so on. Each iteration provides one point to show how the model performs; we then examined all of these together to assess general performance.

Predictive Performance by Bone

[0055] As would be expected, performance decreased when predicting age-at-death using a single bone/radiograph. Using leave one out cross-validation, the overall weighted accuracy was 82% as shown in Table 2:

[Table 2] [0056] There was slight variation in performance by bone, with accuracy ranging from 81% for the left femur to 87% for the right femur. Visual inspection of the radiographs suggests that incompleteness may be the biggest driver of this variation, rather than inherent limitations of the bones themselves. Further supporting this observation, we observed that the femora that were too large to capture in one image (and thus, were often missing one or both diaphyses) often performed worse than femora imaged within one plate/radiograph.

[0057] While preferred embodiments have been described above, those skilled in the art who have reviewed the present disclosure will readily appreciate that other embodiments can be realized within the scope of the invention. Accordingly, the present invention should be construed as limited only by any appended claims.

Claims

CLAIMS What is claimed is:

1. A system for estimating biological age-at-death, comprising: a training dataset of radiographic images of bones of individuals and data indicative of the age-at-death of each of the individuals; a convolutional neural network, trained using the training data, that receives at least one radiographic image of at least one bone of an individual and generates a predicted age of the individual; and a heatmap generation module that uses backpropagation through the convolutional neural network to generate a heatmap identifying portions of the least one radiographic image used to generate the predicted age of the individual.

2. The system of claim 1, wherein the convolutional neural network comprises a plurality of convolutional layers, the plurality of convolutional layers including a first convolutional layer, one or more intermediate convolutional layers, and a final convolutional layer.

3. The system of claim 2, wherein each convolutional layer applies an activation function to a two-dimensional input to generate two-dimensional feature maps.

4. The system of claim 2, wherein the convolutional neural network is trained by iteratively calculating a gradient of a categorical cross-entropy loss with respect to each of a plurality of weights and using an optimization algorithm to adjust each of the plurality of weights.

5. The system of claim 2, wherein using backpropagation through the convolutional neural network comprises computing the gradients for the predicted age with respect to the activation function performed by the final convolutional layer.

6. The system of claim 5, wherein computing the gradients for the predicted age with respect to the activation function performed by the final convolutional layer comprises: creating a gradient vector where the gradient for the predicted age is set to 1 and the gradients for all other ages are set to 0; and computing partial derivatives with respect to activation function performed by the final convolutional layer comprises.

7. The system of claim 5, wherein computing the gradients for the predicted age with respect to the activation function performed by the final convolutional layer further comprises: global average pooling the partial derivatives to calculate a weight for each of the two- dimensional feature maps; and generating a class activation map by computing a weighted sum of each of the two- dimensional feature maps.

8. The system of claim 1, wherein the predicted age is a predicted age range.

9. The system of claim 1, further comprising a preprocessing module that removes text from each of the radiographic images.

10. The system of claim 19, wherein the preprocessing module identifies text in each of the radiographic images using optical character recognition, masks portions of the radiographic images that includes text, and impaints each of the portions of the radiographic images that include text.

11. A method for estimating biological age-at-death, the method comprising: training a convolutional neural network using a training dataset of radiographic images of bones of individuals and data indicative of the age-at-death of each of the individuals; providing at least one radiographic image of at least one bone of an individual to the convolutional neural network; generating, by convolutional neural network, a predicted age of the individual; and using backpropagation through the convolutional neural network to generate a heatmap identifying portions of the least one radiographic image used to generate the predicted age of the individual.

12. The method of claim 11, wherein the convolutional neural network comprises a plurality of convolutional layers, the plurality of convolutional layers including a first convolutional layer, one or more intermediate convolutional layers, and a final convolutional layer.

13. The method of claim 12, wherein each convolutional layer applies an activation function to a two-dimensional input to generate two-dimensional feature maps.

14. The method of claim 12, wherein training the convolutional neural network comprises iteratively calculating a gradient of a categorical cross-entropy loss with respect to each of a plurality of weights and using an optimization algorithm to adjust each of the plurality of weights.

15. The method of claim 12, wherein using backpropagation through the convolutional neural network comprises computing the gradients for the predicted age with respect to the activation function performed by the final convolutional layer.

16. The method of claim 15, wherein computing the gradients for the predicted age with respect to the activation function performed by the final convolutional layer comprises: creating a gradient vector where the gradient for the predicted age is set to 1 and the gradients for all other ages are set to 0; and computing partial derivatives with respect to activation function performed by the final convolutional layer comprises.

17. The system of claim 16, wherein computing the gradients for the predicted age with respect to the activation function performed by the final convolutional layer further comprises: global average pooling the partial derivatives to calculate a weight for each of the two- dimensional feature maps; and generating a class activation map by computing a weighted sum of each of the two- dimensional feature maps.

18. The method of claim 11, wherein the predicted age is a predicted age range.

19. The method of claim 11, further comprising: removing text from each of the radiographic images.

20. The method of claim 19, wherein removing the text from each of the radiographic images comprises: using optical character recognition to identify portions of the radiographic images that includes text; and impainting each of the portions of the radiographic images that include text.