Monday Nov 04, 2024

EP55: Why don’t we have super robots that do all of our dirty work for us yet…

super robots

Summary:

In this episode we explore the challenges of creating super robots that can perform human tasks, focusing specifically on the area of computer vision. We rely on Kai Fu-Lee’s book “AI 2040” to explain that computer vision involves teaching computers to "see" not just by capturing images, but by understanding what they see. This process encompasses several levels of complexity, ranging from basic image processing to scene comprehension. We highlight that humans effortlessly apply knowledge of the world to their vision, but teaching this to a computer is a major challenge addressed by innovative technologies like convolutional neural networks.

Questions to consider as you read/listen:

1. What are the major challenges in developing robots that can perform complex tasks?

2. How does computer vision contribute to the development of intelligent robots?

3. What are the key differences between human vision and computer vision?

Long format:

One of the most difficult aspects of getting robots to “work” is the interface of dexterity and “seeing” (computer vision).

Computer vision (CV) is a sub branch of AI that focuses on the problem of teaching computers to see. The word “see” here does not mean just the act of acquiring a video or image, but also making sense of what a computer sees. Computer vision includes a following capabilities, increasing complexity:

Image capturing and processing—use cameras and other sensors to capture real-world 3D scenes in a video. Each video is composed of a sequence of images, and each image is a two-dimensional array of numbers representing the color, where each number is a “pixel.”

Object detection and image segmentation—divide the image into prominent regions and find where the objects are.

Object recognition—recognizes the object (for example, a dog), and also understands the details (German Shepherd, dark brown, and so on).

Object tracking—follows moving objects in consecutive images or video.

Gesture and movement recognition—recognize movements, like a dance move in an Xbox game.

Scene understanding—understands a full scene, including subtle relationships, like a hungry dog looking at a bone.

When we humans “see“ we are actually applying our accumulated knowledge of the world – everything we’ve learned in our lives about perspective, geometry, common sense, and what we have seen previously. These come naturally to us, but are very difficult to teach a computer. This is a quite fine use of AI: the invention of convolutional neural networks (CNN).

Source:

AI 2040 by Kai Fu-Lee

Comment (0)

No comments yet. Be the first to say something!