How Computer Vision Works | Artificial Intelligence and Machine Learning

Опубликовано: 19 Октябрь 2024
на канале: Acadaimy

5,001

817

Have you ever wondered how self driving

cars such as Tesla are able to navigate all kinds of roads with such ease,

precision, and accuracy, essentially allowing individuals to sit

back and relax in a car that has no human human operating it?

How is a car able to see the road

make sense out of the entities that it's seeing and maneuver accordingly?

A car obviously doesn't have eyes like a human does, so how can it see things

like pedestrians, stop and go lights, stop signs, road lines, other road signs, etc.

Well, it turns out that cars do have eyes,

but not in the form that we're familiar with.

Computer vision is the field of artificial intelligence that gives machines

the ability to see the environment around them.

It trains computers to understand and interpret the world.

Utilizing digital images from cameras, videos, and machine learning models.

Machines are given the power to identify

and classify objects, giving them the ability to react to all that they "see"

Now, before we get into the nuances of how this fascinating field works,

let's travel back in time and observe computer vision in its early days.

In the early 1950s, the first experiments for computer vision were conducted.

Neural networks were used to detect object edges

and interpret simple handwritten text.

As big data surged in the 1990s,

large sets of images of people and things emerged on the Internet,

and machines could identify specified objects in photos and videos.

Computer vision has now flourished given the abundance of photos and videos,

also known as big data in our society, as well as the advanced hardware,

software, and algorithms. So how exactly does computer vision work?

Well, think of it like a puzzle.

You have with you several different kinds

of pieces that all fit together in some way to form a complete image.

You look at the edges and the individual elements

of each puzzle piece to perceive which components fit together

and approximately where they should be placed to create a cohesive whole.

This is analogous to the process that a machine, specifically neural

networks, go through when trying to understand visual images.

They identify edges and borders and

try to identify model sub components.

Instead of being given a complete image,as humans are

usually at top of the puzzle box, they are trained using hundreds of thousands of similar images,

and these images are available thanks to the big data that's available

in society, as we touched on in the beginning of this video.

Now if you'd like to learn more about big

data, I have a video that covers the basic components of this concept.

And if you'd like to know more about how

the machine actually trains its algorithm using the hundreds

of thousands of images it receives through machine learning, I'd recommend you check

out my video on this subfield of artificial intelligence.

Right now, let's take a look at how a specific architecture of neural networks

,called convolutional neural networks, powers computer vision.

So we all know that images are made up of big grids of pixels.

Each Pixel has a designated color on the red green blue scale where

the primary colors are combined in various ways to represent diverse colors.

To identify features in images,

computer vision considers small patches of pixels through mathematical notation

called kernel or filter, which contain values for Pixel wise multiplication.

So if you've watched my video on neural networks, you might recall

that an artificial neuron, the basic component of neural nets,

takes in a series of inputs, multiplies the inputs by specified weights

and biases, and uses back propagation to learn from mistakes.

These input weights are analogous to kernel values where neural nets learn

useful kernels that are able to recognize unique features and images.

Convolutional Neural nets utilize a range

of preexisting neurons to process each new image.

With each layer, the prior image is digested

and manipulated by different learned kernels and a new image is output.

This output is then processed by the next layer of neurons resulting in repeated

convolutions. Connecting this to our puzzle example,

for example, the first convolution might discover edges.

Then the next layer might convolve

on the edge features to detect simple shapes made up of edges or corners.

Then the next layer might convolve on the corner features and utilize neurons

that can detect simple entities such as noses and mouths.

This process repeats and grows in complexity with every passing layer

until the machine reaches a layer that can recognize all parts of the image,

for example the eyes, nose, ears and mouths and deem the image of face.

So in summary, computer vision works

by acquiring an image, processing the image and understanding the image.

For more videos on your journey towards mastering AI, be sure to Subscribe to / acadaimy .

Thanks for watching!