
Monday Nov 04, 2024
EP55: Why don’t we have super robots that do all of our dirty work for us yet…
Summary:
In this episode we explore the challenges of creating super robots that can perform human tasks, focusing specifically on the area of computer vision. We rely on Kai Fu-Lee’s book “AI 2040” to explain that computer vision involves teaching computers to "see" not just by capturing images, but by understanding what they see. This process encompasses several levels of complexity, ranging from basic image processing to scene comprehension. We highlight that humans effortlessly apply knowledge of the world to their vision, but teaching this to a computer is a major challenge addressed by innovative technologies like convolutional neural networks.
Questions to consider as you read/listen:
1. What are the major challenges in developing robots that can perform complex tasks?
2. How does computer vision contribute to the development of intelligent robots?
3. What are the key differences between human vision and computer vision?
Long format:
One of the most difficult aspects of getting robots to “work” is the interface of dexterity and “seeing” (computer vision).
Computer vision (CV) is a sub branch of AI that focuses on the problem of teaching computers to see. The word “see” here does not mean just the act of acquiring a video or image, but also making sense of what a computer sees. Computer vision includes a following capabilities, increasing complexity:
Image capturing and processing—use cameras and other sensors to capture real-world 3D scenes in a video. Each video is composed of a sequence of images, and each image is a two-dimensional array of numbers representing the color, where each number is a “pixel.”
Object detection and image segmentation—divide the image into prominent regions and find where the objects are.
Object recognition—recognizes the object (for example, a dog), and also understands the details (German Shepherd, dark brown, and so on).
Object tracking—follows moving objects in consecutive images or video.
Gesture and movement recognition—recognize movements, like a dance move in an Xbox game.
Scene understanding—understands a full scene, including subtle relationships, like a hungry dog looking at a bone.
When we humans “see“ we are actually applying our accumulated knowledge of the world – everything we’ve learned in our lives about perspective, geometry, common sense, and what we have seen previously. These come naturally to us, but are very difficult to teach a computer. This is a quite fine use of AI: the invention of convolutional neural networks (CNN).
Source:
AI 2040 by Kai Fu-Lee
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.