Feedback is important for surgical trainees but it can be biased and time-consuming. We examined crowd-sourced assessment as an alternative to experienced surgeons’ assessment of robot-assisted radical prostatectomy (RARP).
We used video recordings (n = 45) of three RARP modules on the RobotiX, Simbionix simulator from a previous study in a blinded comparative assessment study. A group of crowd workers (CWs) and two experienced RARP surgeons (ESs) evaluated all videos with the modified Global Evaluative Assessment of Robotic Surgery (mGEARS).
One hundred forty-nine CWs performed 1490 video ratings. Internal consistency reliability was high (0.94). Inter-rater reliability and test-retest reliability were low for CWs (0.29 and 0.39) and moderate for ESs (0.61 and 0.68). In an Analysis of Variance (ANOVA) test, CWs could not discriminate between the skill level of the surgeons (p = 0.03-0.89), whereas ES could (p = 0.034).
We found very low agreement between the assessments of CWs and ESs when they assessed robot-assisted radical prostatectomies. As opposed to ESs, CWs could not discriminate between surgical experience using the mGEARS ratings or when asked if they wanted the surgeons to perform their robotic surgery.
© 2023. The Author(s).