NA-MIC Project Weeks

Back to Projects List

Slicer-to-Action for surgical robot imitation learning

Key Investigators

Taewoo Yoon (AIRS Inc, Republic of Korea)
Joonho Seo (Korea Institute of Machinery Materials, Republic of Korea)
Minjune Kim (AIRS Inc, Republic of Korea)

Github Repository

lerobot

Funding Source(s)

(to be added)

Project Description

We are developing a robotic system for fracture reduction using SlicerROS2. The goal of this project is to investigate whether recent imitation learning approaches, such as visuomotor policies or Vision-Language-Action (VLA) models, can be applied to this robot system. In this project, 3D Slicer functions as a tool to display real-time object movements in a 3D view. We intend to utilize this 3D view as training data for robotic motion. The 3D Slicer view will function just like a real camera mounted on the robot. Furthermore, it will allow us to build custom simulation environments to generate and train on virtual data.

Objective

Establishing a pipeline to integrate 3D Slicer’s visual data with robot joint data for advanced policy learning and real-time inference.
Establishing an environment dedicated to generating and training on synthetic simulation data.

Approach and Plan

Establishing an environment to track an object rigidly coupled with the robot in the 3D view, ensuring its pose updates dynamically in accordance with the robot’s motion.
Capture the 3D view for a specified number of frames while simultaneously acquiring the corresponding robot joint data.
Train the policy based on the training dataset created in step 2.
After training the policy, inferred joint positions are fed as control commands into either a simulation tool or physical robot.
Input the 3D Slicer view into the trained model for inference.
Feed the inferred joint position into either the simulation tool or physical robot, and sequentially input the updated 3D Slicer view as the next state input as the robot moves.

Progress and Next Steps

1. Synthetic Training Data Generation
Randomly generate diverse bone fragment configurations.
Generate and apply a reduction trajectory that reduces each configuration.
During the reduction process, capture images from four views (Axial, Lateral, Medial, ISO) in 3D Slicer, and store the corresponding strut lengths as an episode.
Save the data as a dataset of 100+ episodes.
2. Training Policy
Feed the stored dataset into the LeRobot visuomotor policy framework.
Train an ACT (Action Chunking with Transformers) policy to obtain the policy network (100K steps, loss < 0.03).
3. Run Reduction
Randomly generate an arbitrary bone fragment configuration.
(1) Capture the four views in Slicer and feed them, together with the corresponding strut lengths, into the policy as the observation.
(2) Infer strut lengths.
(3) Perform forward kinematics analysis and update the robot’s pose.
(4) Return to (1) and iterate.

Illustrations

Background and References

References [1] T. Z. Zhao, V. Kumar, S. Levine, and C. Finn, “Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware,” in Proc. Robotics: Science and Systems (RSS), 2023. arXiv:2304.13705.
[2] R. Cadene, S. Alibert, A. Soare, Q. Gallouédec, A. Zouitine, T. Wolf, et al., “LeRobot: State-of-the-art Machine Learning for Real-World Robotics in PyTorch,” 2024. [Online]. Available: https://github.com/huggingface/lerobot