Using MCP Servers with Robots

How to integrate MCP hosts with robots using an MCP server

The following video explains how to integrate with an LLM via an MCP host, such as Claude or Cursor, with a robot using an MCP server. The video covers the architecture and code and also shows how to integrate with the host and shows a live demo with the robot and Claude and also with Cursor.

Some additional supporting documentation that may aid in you integration of LLMs and Robots

Code for the video above can be found on GitHub at: https://github.com/johneyesbot/DemoRobotMCPServer

Yolo

The object detection model used on the robot side in the demo video above is Yolo, which is very easy to use. The code used for running Yolo in the above instance is shown below. The main server, in Node.js, spawns the Python process which runs Yolo and then talks with it using STDIN/STDOUT.

            
                import os, sys, os.path
                from ultralytics import YOLO
                
                model = YOLO("yolov8n.pt")
                
                #turn stdin to non-blocking
                #flags = fcntl.fcntl(0, fcntl.F_GETFL)
                #fcntl.fcntl(0, fcntl.F_SETFL, flags | os.O_NONBLOCK)
                
                for line in sys.stdin:
                    print("got something on sdtin", flush=True)
                    filePathAndName = line.strip()
                    print("it was", filePathAndName, flush=True)
                    if os.path.isfile(filePathAndName):
                        print("and it was a file", flush=True)
                        results = model(filePathAndName)
                        print(results, flush=True)
                    else:
                        print("it was not a file", flush=True)

Grabbing frames from OpenCV

The following code is used along with OpenCV3 to grab frames that are made available to the calling process via the file system. This process and the spawing process communicate via STDIN/STDOUT

            
                import cv2
                import argparse
                import sys
                import select
                import fcntl, os, sys, time
                
                CV_CAP_PROP_FRAME_WIDTH = 3
                CV_CAP_PROP_FRAME_HEIGHT = 4
                CV_CAP_PROP_BUFFERSIZE = 38
                
                #turn stdin to non-blocking
                flags = fcntl.fcntl(0, fcntl.F_GETFL)
                fcntl.fcntl(sys.stdin.fileno(), fcntl.F_SETFL, flags | os.O_NONBLOCK)
                
                parser = argparse.ArgumentParser(description='Start grabbing and processing frames from a camera')
                
                parser.add_argument("-c", "--camera", dest="cameraNumber", help="which camera (1..99) to connect to")
                
                args = parser.parse_args()
                
                cameraNumber = str(args.cameraNumber)
                
                filePath = "/home/eyesbot/Code/chariot/control-server/"+cameraNumber+".jpg"
                
                def startGrabbingAndProcessingFrames():
                
                    cap = cv2.VideoCapture( int(cameraNumber) )
                    cap.set(CV_CAP_PROP_FRAME_WIDTH, 480) 
                    cap.set(CV_CAP_PROP_FRAME_HEIGHT, 360) 
                    cap.set(CV_CAP_PROP_BUFFERSIZE, 1)
                
                    while (True):
                        time.sleep(0.01)
                
                        if select.select([sys.stdin,],[],[],0.0)[0]:
                            buf = os.read(sys.stdin.fileno(), 10)
                
                            if buf:
                                ret, frame = cap.read()
                
                                if ret==True:
                                    cv2.imwrite(filePath, frame)
                                    sys.stdout.write("+" );
                                    sys.stdout.flush()
                
                startGrabbingAndProcessingFrames()

References

To see the standard list of Rest endpoints that the robot supports, please refer to the table on this page.