|
1. The number and diversity requirements for clinical research participants have significantly increased
Old version guidelines: Only 10 subjects are required to participate in clinical studies.
New draft: Require at least 150 participants and cover diverse characteristics such as skin color (evaluated using Monk Skin Tone Scale), blood oxygen saturation level (70% -100%), age, gender, BMI, and ethnicity. Each participant is required to provide at least 20 pairs of blood oxygen data, totaling at least 3000 pairs of data.
Purpose: To enhance the representativeness of the sample and reduce measurement bias caused by differences in skin color, especially among people with dark skin.
2. Standardization of Skin Color Assessment Methods
New draft: New requirements have been added, introducing subjective Monk Skin Tone Scale (MST) and objective Individual Type Angle (ITA) calculation methods to systematically classify and group subjects based on their skin color. The study needs to ensure that the proportion of participants in different MST groups reaches at least 30%.
Old version guidelines: Failure to specify skin color assessment methods may result in insufficient coverage of clinical data on skin color.
3. Refinement of data distribution and stratification
New draft: It is required that clinical data be evenly distributed in different blood oxygen saturation ranges (such as 70% -80%, 80% -90%, 90% -100%) and different skin color groups, with each range covering at least a certain proportion of the data volume. For example, each blood oxygen saturation interval needs to include at least 30% of participants.
Old version guidelines: No specific requirements for data layering have been proposed, which may affect the performance validation of devices in different physiological states.
4. Label declaration and transparency requirements
New draft: Added label statement, if the device passes the new standard verification, its performance in different skin color groups must be clearly marked in the label, such as "suitable for dark skin groups" or "need to be used in conjunction with clinical judgment" and other warning information.
Old version guidelines: This type of declaration is not mandatory, and users may not be able to accurately understand the device's usage restrictions.
5. Scope of application and adjustment of product classification
New draft: Regarding the scope of application, the new version clearly applies to specific product codes (such as DQA, NLF, OLK, etc.), while excluding some codes covered by the old version (such as OCH, PGJ, etc.).
Old version guide: Failure to provide detailed classification of applicable device types may lead to ambiguity in the implementation of the guide.
Summary:
The core goal of the new draft guidelines is to improve the accuracy of pulse oximeters in different skin color populations, especially to address measurement bias issues caused by skin color differences in previous studies. By expanding sample diversity, standardizing evaluation methods, refining data stratification, and enhancing label transparency, the FDA aims to drive manufacturers to optimize device performance and provide more reliable clinical decision-making basis for healthcare professionals.
|
|
(1) Controlled desaturation laboratory research
(2) Other considerations for pulse oximeters for children under 12 years old
(3) Appendix B Considerations for Printing Skin Color Cards
Clinical research is crucial for evaluating the safety and effectiveness of all pulse oximeter systems within the scope of this guidance, as well as ensuring no performance differences between different skin color populations.
Suggestion: We suggest conducting a controlled desaturation laboratory study as described in Annex EE of the second edition of the international standard ISO80601-2-61, December 2017 (February 2018 revised edition), to determine the accuracy of blood oxygen saturation (SpO ₂). We also suggest using this study to demonstrate that there is no performance difference in the new pulse oximeter system. In addition, for pulse oximeter systems intended for pediatric populations under 12 years old, we recommend providing convenient arterial blood samples (SaO ₂, SpO ₂) for pediatric populations under 12 years old to ensure the shape, fit, and clinical performance of the sensor site.
Normally, when the proposed alternative is supported by sufficient scientific evidence, we intend to consider alternative methods beyond clinical testing to demonstrate substantial equivalence. For example, when the changes or modifications made do not affect the optical link, signal processing path, and SpO ₂ algorithm, additional clinical studies may not be necessary to demonstrate substantial equivalence.
If a clinical investigation is conducted to demonstrate substantial equivalence, i.e. before obtaining a 510 (k) device license, it must comply with the Medical Device Research Exemption (IDE) regulations, namely 21CFR Part 812.. Generally speaking, we believe that pulse oximeters referred to in this guideline document should be considered low-risk devices; Therefore, this study may need to comply with the simplified requirements of 21 CFR 812.2 (b). You can refer to the guidelines of the US Food and Drug Administration (FDA) titled "Research on High Risk and Low Risk Medical Devices". In addition, applicants conducting device research aimed at demonstrating substantial equivalence in the United States are required to comply with regulations regarding institutional review boards (21 CFR Part 56) and human subject protection (21 CFR Part 50), including requirements for informed consent.
When submitting clinical research data conducted outside the United States to the FDA to support the IDE or pre-market submission of these devices, the requirements of 21 CFR 812.28 may apply. Part 812.28 (a) of Title 21 of the United States Code of Federal Regulations outlines the conditions under which the FDA accepts data from clinical studies conducted outside the United States to support investigational device exemptions (IDEs) or pre-market submissions. For more information, please refer to the FDA's guidelines "Accepting Clinical Data to Support Medical Device Applications and Submissions: Frequently Asked Questions".
In some cases, 'Real World Data (RWD)' can be used, such as to support indication extensions or non significant performance evaluations for devices that have received 510 (k) approval. The FDA encourages manufacturers to contact the agency when they have questions about RWD. Whether an Investigational Device Exemption (IDE) is required for collecting RWD from legally listed devices depends on the specific circumstances. For example, if an approved device is used in normal medical practice, an IDE may not be required. For more information on this topic, please refer to the FDA guidelines' Using Real World Evidence to Support Medical Device Regulatory Decisions'.
(1) Controlled Desaturation Laboratory Research
1. Purpose/Objective
The purpose of conducting an invasive controlled desaturation laboratory study is to verify the accuracy of SpO2 in pulse oximeter systems and compare it with the reference measurement values of functional SaO2 measured by CO oximeters, while demonstrating its indistinguishable performance in different skin pigments.
2. Research Design
We suggest that you conduct the study in a population of people of color consisting of 150 or more healthy participants, following the method described in Annex EE of ISO 80601-2-61 Second Edition 2017-12 (revised version 2018-02).
For research registration, we suggest the following:
Evaluate the frontal pigmentation of study participants using the Monk Skin Color Scale (MST) through visual assessment - this is a subjective skin color annotation system consisting of 10 levels with high inter rating reliability (see printing recommendations in Appendix B) CIELAB color space terminology;
Determine the L * and b * values of forehead pigmentation in subjects using colorimetric method, calculate the individual typological angle (ITA), defined as:

According to Section 3 of the FDA draft guidelines on the collection of race and ethnic data in clinical trials and studies of FDA regulated medical products, record information related to race and ethnic diversity during the recruitment period;
Assign registered research participants to three specific MST groups: 1-4, 5-7, 8-10, while ensuring the following:
At least 25% of participants belong to each MST group;
At least 50% of participants in MST group 8-10 have an ITA of ≤ -50 ° on their forehead; and
In each MST group, at least 40% of participants are male and at least 40% are female.
We suggest that you submit the scheme used for assigning skin color (MST) and evaluating interface thermal analysis (ITA) in your pre listing submission. If additional feedback is required, we recommend communicating with the US Food and Drug Administration (FDA) through the pre established communication process outlined in the "Feedback Request and Meeting for Medical Device Submission: Q Submission Plan" guidelines, in order to discuss your proposed Skin Color (MST) allocation and Interface Thermal Analysis (ITA) evaluation plan in advance before conducting the study.
In addition, we recommend measuring the Interface Thermal Analysis (ITA) value of the surface in direct contact with the sensor emitter. For fingertip sensors, in order to obtain the widest range of skin pigmentation changes applicable to the sensor placement location, we recommend evaluating the interface thermal analysis (ITA) values of the sensor site on the skin surface with pigmentation in the middle of the dorsal part of the distal phalanx (near the nail epithelium, as shown in the yellow circle in Figure 1).
Figure 1: Image of a fingertip
We suggest that you obtain paired observations of blood oxygen saturation (SpO2) from 3000 or more pulse oximeters and blood oxygen saturation (SaO2) from carbon monoxide oximeters. We recommend that each participant provide 20 or more data pairs that cover the range of blood oxygen saturation (SaO2) between 70% and 100%, and that each (MST) group (MST 1-4, 5-7, 8-10) and each SaO2 decile range (70% ≤ SaO2<80%, 80% ≤ SaO2<90%, 90% ≤ SaO2 ≤ 100%) account for at least 30% of the data pairs. We suggest that you provide a list of data pairs according to the participants.
For more information on the principles of preclinical clinical study design that play a critical role in determining the substantial equivalence, safety, and efficacy of medical devices, please refer to the guidelines of the US Food and Drug Administration (FDA) titled "Design Considerations for Critical Clinical Studies of Medical Devices".
3. Inclusion/exclusion criteria
We recommend that your participants are healthy adults who can tolerate the desaturation described in Annex EE of ISO 80601-2-61 Second Edition 2017-12 (revised version 2018-02). In addition, we recommend excluding participants with uneven skin color on the sensor area or forehead.
4. Participant demographics
We suggest that the study population used to determine the accuracy of SpO2 should include different participants continuously selected from the existing pool of healthy participants, rather than participants from the same device calibration curve development study.
We believe that the collection and presentation of race and ethnicity data should typically be submitted to the FDA prior to marketing, as described in the draft FDA guidelines on "Collection of Race and Ethnicity Data in FDA Regulated Medical Product Clinical Trials and Studies".
You should describe the characteristics of the participating population that may affect the research results, including:
Age;
Gender;
Body mass index;
Race reported by self/caregiver;
Race reported by self/caregiver;
The MST and ITA values of each participant's forehead;
ITA value of transmitter sensor position;
The anatomical applicable size range of the sensor site;
Study the percentage modulation range of (SaO2, SpO2) by participants when obtaining data; and
The percentage of each MST group that can tolerate complete desaturation (down to 70% of SaO2).
For more information on the evaluation and reporting of age, race, ethnicity, and gender specific data in clinical trials of medical devices, please refer to the FDA guidelines "Evaluation of Gender Specific Data in Clinical Trials of Medical Devices" and "Evaluation and Reporting of Age, Race, and Ethnicity Specific Data in Clinical Trials of Medical Devices"
5. Agreement
We suggest that you provide a percentage modulation range for study participants when obtaining data pairs (SaO2, SpO2), and describe the method of obtaining these values in the pre-market submission.
In addition, we recommend conducting SpO2 accuracy testing under exercise conditions on all continuous (real-time monitoring and continuous data archiving) pulse oximeters and discontinuous pulse oximeters used under exercise conditions. We suggest including a description of the characteristics of each motion in the test report, including the amplitude, type, and frequency of the selected motion in the test, as well as the reasonableness of the methods used to specify the purpose of the device.
6. Validity endpoints and data
We suggest presenting Arms specifications that are less than 3% (accuracy range, may be specific indicator related concepts, need to be combined with specific context) in a statistically significant way, such as using a 95% confidence interval (95% CI). We recognize that accuracy is not only influenced by other factors, but also a function of participant characteristics, application site, and sensor geometry. Table 3 summarizes the recommended Arms (accuracy range) between the measured value (SpO2) and the reference value (SaO2) under normal conditions of blood oxygen saturation (SpO2) ranging from 70% to 100%.
 
7. Statistical analysis considerations
i. Common Main Analysis
For key control desaturation studies, we recommend conducting a common primary analysis on the following performance indicators:
1. Accuracy of SpOz for all study participants (Arms).
2. SpOz deviation (average error) is a function of forehead Sa02 and MST.
3. SpOz deviation (average error), as a function of Sa0z and ITA, is measured on the skin surface in contact with the device sensor transmitter.
Ii. Recommended success criteria
For common primary analysis, we recommend the following success criteria:
Overall accuracy: Arms (accuracy range) is less than 3%. The overall accuracy must meet this requirement.
Non differential performance evaluation 1: In pairwise comparisons of MST groups 1-4, 5-7, and 8-10, for the interval of 70% ≤ SaO ₂ ≤ 85%, the maximum difference in SpO ₂ deviation should be less than 3.5%; For the range of 85%<SaO ₂ ≤ 100%, the maximum difference in SpO ₂ deviation should be less than 1.5%.
Non differential performance evaluation 2: When there is a 100 point change in the interface thermal analysis (ITA) at the transmitter sensor location, for the range of 70% ≤ SaO ₂ ≤ 85%, the difference in SpO ₂ deviation should be less than 3.5%; For the interval of 85%<SaO ₂ ≤ 100%, the difference in SpO ₂ deviation should be less than 1.5%.
We suggest that all three success criteria should be presented in a statistically significant manner, either using a one-sided hypothesis test with a significance level of 2.5% (the p-value of the null hypothesis is less than 2.5%), or using a two-sided 95% confidence interval (the boundary of the 95% confidence interval means that the success criteria for the parameters are met).
In order to visually describe device performance (i.e. consistency, bias, and uncertainty), the US Food and Drug Administration (FDA) recommends that in pre-market submissions, Bland Altman plots, modified Bland Altman plots, Q-Q plots, and inverse prediction plots should typically be provided. The FDA recommends that symbols or colors be used to identify MST groups (1-4, 5-7, and 8-10) when drawing these graphs. The FDA also recommends that the Bland Altman chart and the modified Bland Altman chart should include a 95% consistency range.
Iii Sample size
The sample size of research participants should be taken as the maximum value of the sample size corresponding to sufficient testing power (recommended testing power of 80% or higher) to meet various success criteria and make it statistically significant. To achieve sufficient testing efficacy, the US Food and Drug Administration (FDA) recommends selecting 150 or more participants who meet the inclusion criteria described in Section 4 O (1) b as the sample size.
The appropriate number of research participants depends on the accuracy of the pulse oximeter, the variability of the data, and the average number of paired repeated measurements (SpO ₂, SaO ₂) for each participant. We recommend that each participant perform an average of 20-24 synchronized paired repeated measurements, with a minimum of 17 and a maximum of 30 paired measurements per participant. Additionally, at least 30% of the data pairs in each decile range of blood oxygen saturation (SaO ₂) (70% ≤ SaO ₂<80%, 80% ≤ SaO ₂<90%, 90% ≤ SaO ₂ ≤ 100%) should be included.
When there is uncertainty regarding data variability or pulse oximeter accuracy, it may be advantageous to conduct an adaptive study that can adjust the sample size based on accumulated data, if feasible.
Iv. Analyze the population and methods
Performance indicators should be analyzed using the intention to diagnose (ITD) analysis population, which is defined as all participants included in the study and all paired repeated measurement data (SpO ₂, SaO ₂), even if one or both data are invalid, cannot be evaluated, or missing. In other words, regardless of whether the data is complete or not, participants and paired repeated measurement data should not be excluded from the analyzed population. You should report the number and proportion of incomplete data pairs.
v. Missing data
Measures to reduce missing data:
We suggest that you describe the measures you plan to take during the research process to minimize the occurrence of participant dropout and missing data.
Reasons for missing data recording:
We suggest that you clarify the reason for missing data when it occurs, for example:
Participants withdraw midway;
The number of paired repeated measurements of participants is insufficient (in terms of quantity or coverage of blood oxygen saturation);
Participants were excluded from the analysis;
The paired repeated measurement data is incomplete (invalid or missing SpO ₂ or SaO ₂).
In order to provide a complete and detailed explanation of the situation of all study participants, we suggest that you collect complete information during the study period. If there is no complete information, the data may be excluded from analysis, which may introduce analytical bias and jeopardize the conclusions that can be drawn about the substantial equivalence, safety, and effectiveness of your device.
h. Grouping sensors for testing
If certain sensors have similar designs or equivalent performance, it may be appropriate to group them for testing. If sensors contain the same materials and optoelectronic components, and have equivalent sensor characteristics (such as usage location), we consider them to have similar designs. If you choose to group test sensors based on similar designs, we suggest that you indicate whether all sensors within each group contain the same materials and optoelectronic components, and describe the reasons for grouping. Generally speaking, clip on sensors and adhesive sensors should not be grouped based on similar designs, as they differ in appearance, adaptability, and functional specifications. If you choose to group test sensors based on equivalent performance, we recommend that you provide valid scientific evidence and statistical analysis to demonstrate that the test results are generalizable.
(2) Other precautions for pulse oximeters suitable for pediatric populations under 12 years old
If the pulse oximeter system is intended for the pediatric population under 12 years old, consideration should be given to data that can support the accuracy of clinical performance and related pathophysiological status of the relevant pediatric subpopulation. As mentioned earlier in this guideline, due to significant differences in the shape and fit of pulse oximeter sensors, these differences may result in varying overall accuracy of the system. Therefore, clinical performance testing of pulse oximeter systems in the adult population (see Part IV O (1)) may not be sufficient to support the clinical performance of certain pediatric subgroups, such as newborns, infants, and children under 12 years old.
If the device is intended for use in pediatric populations under the age of 12, the US Food and Drug Administration (FDA) recommends that manufacturers verify the performance of the device in this population by:
(1) As described in Part 4 O (1) b, evaluate the performance of the pulse oximeter system using pediatric sensors in adult participants of different skin colors;
(2) Evaluate the performance of pediatric participants within a specific age range (and related clinical pathological and physiological states) based on indications and sensor placement locations.
Specifically, this section emphasizes that when pulse oximeters are used in children under 12 years old, their performance in children cannot be simply inferred from adult test data. Because children and adults have differences in physiological states, sensor fit, and other aspects. The first point in the validation method proposed by the FDA is to use adult participants to preliminarily evaluate the performance of pediatric sensors, as adults have high compliance and can simulate some possible situations by selecting adults with different skin colors; The second point is to directly evaluate the performance of the pulse oximeter for the target pediatric population, taking into account that children of different age groups have different clinical, pathological, and physiological states, which can affect the performance of the pulse oximeter. For example, the physiological characteristics of newborns are significantly different from those of older children, so performance evaluation should be conducted in a specific age range of pediatric populations based on specific usage instructions and sensor placement.
Although pediatric (such as neonatal) clinical studies are more representative of the intended use compared to adult controlled laboratory studies, the sampled data may not cover the entire range of blood oxygen saturation (SaO2) validated in adult controlled studies, and these data were obtained under uncontrolled conditions (such as temperature, comorbidities, asynchronous data pairs). However, we suggest that you provide data and samples from a sufficient number of participants who are evenly distributed within subgroups of the population, and you need to provide a reasonable explanation of the sample size and the range of blood oxygen saturation for the data (SaO2, SpO2).
In addition, we recommend including the modulation percentage range of study participants when obtaining data pairs. If your study recruits based on skin pigmentation (i.e., it is expected that the changes in skin pigmentation at the sensor placement site in your pediatric subgroup will be greater than those in the control desaturated adult study), we recommend that you include in the pre-market submission the reported race, ethnicity, MST measurement site, MST value for each participant, as well as the interfacial thermal analysis (ITA) value at the transmitter sensor site for each relevant pediatric subgroup.
For additional feedback on validating the performance of pulse oximeters for patients under 12 years of age, we strongly recommend communicating with the US Food and Drug Administration (FDA) as early as possible through the pre submission process, as described in the FDA guidelines "Feedback Requests and Meetings for Medical Device Submission: Q Submission Plan", in order to discuss methods and special considerations for supporting pediatric indications for each device.
Appendix B. Considerations for Printing Monk Skin Tone Color Cards
Clear defined color levels in standardized color spaces such as CIELAB should be used to support the evaluation of indistinguishable performance, as described in section O (1) b of this document. One of the available options is the Skin Color Scale (MST). The US Food and Drug Administration (FDA) recommends evaluating skin color using the Skin Color (MST) method, where the color chart is based on the L * a * b * values listed in Table B1. We recommend using a calibrated printer to professionally print color cards on suitable paper. The accuracy of the color chart should be verified using a calibrated spectrophotometer.
Table B1: Skin tone (MST) defined in the CIELAB color space

|