Have you ever wondered how self driving
cars such as Tesla are able to navigate all kinds of roads with such ease,
precision, and accuracy, essentially allowing individuals to sit
back and relax in a car that has no human human operating it?
How is a car able to see the road
make sense out of the entities that it's seeing and maneuver accordingly?
A car obviously doesn't have eyes like a human does, so how can it see things
like pedestrians, stop and go lights, stop signs, road lines, other road signs, etc.
Well, it turns out that cars do have eyes,
but not in the form that we're familiar with.
Computer vision is the field of artificial intelligence that gives machines
the ability to see the environment around them.
It trains computers to understand and interpret the world.
Utilizing digital images from cameras, videos, and machine learning models.
Machines are given the power to identify
and classify objects, giving them the ability to react to all that they "see"
Now, before we get into the nuances of how this fascinating field works,
let's travel back in time and observe computer vision in its early days.
In the early 1950s, the first experiments for computer vision were conducted.
Neural networks were used to detect object edges
and interpret simple handwritten text.
As big data surged in the 1990s,
large sets of images of people and things emerged on the Internet,
and machines could identify specified objects in photos and videos.
Computer vision has now flourished given the abundance of photos and videos,
also known as big data in our society, as well as the advanced hardware,
software, and algorithms. So how exactly does computer vision work?
Well, think of it like a puzzle.
You have with you several different kinds
of pieces that all fit together in some way to form a complete image.
You look at the edges and the individual elements
of each puzzle piece to perceive which components fit together
and approximately where they should be placed to create a cohesive whole.
This is analogous to the process that a machine, specifically neural
networks, go through when trying to understand visual images.
They identify edges and borders and
try to identify model sub components.
Instead of being given a complete image,as humans are
usually at top of the puzzle box, they are trained using hundreds of thousands of similar images,
and these images are available thanks to the big data that's available
in society, as we touched on in the beginning of this video.
Now if you'd like to learn more about big
data, I have a video that covers the basic components of this concept.
And if you'd like to know more about how
the machine actually trains its algorithm using the hundreds
of thousands of images it receives through machine learning, I'd recommend you check
out my video on this subfield of artificial intelligence.
Right now, let's take a look at how a specific architecture of neural networks
,called convolutional neural networks, powers computer vision.
So we all know that images are made up of big grids of pixels.
Each Pixel has a designated color on the red green blue scale where
the primary colors are combined in various ways to represent diverse colors.
To identify features in images,
computer vision considers small patches of pixels through mathematical notation
called kernel or filter, which contain values for Pixel wise multiplication.
So if you've watched my video on neural networks, you might recall
that an artificial neuron, the basic component of neural nets,
takes in a series of inputs, multiplies the inputs by specified weights
and biases, and uses back propagation to learn from mistakes.
These input weights are analogous to kernel values where neural nets learn
useful kernels that are able to recognize unique features and images.
Convolutional Neural nets utilize a range
of preexisting neurons to process each new image.
With each layer, the prior image is digested
and manipulated by different learned kernels and a new image is output.
This output is then processed by the next layer of neurons resulting in repeated
convolutions. Connecting this to our puzzle example,
for example, the first convolution might discover edges.
Then the next layer might convolve
on the edge features to detect simple shapes made up of edges or corners.
Then the next layer might convolve on the corner features and utilize neurons
that can detect simple entities such as noses and mouths.
This process repeats and grows in complexity with every passing layer
until the machine reaches a layer that can recognize all parts of the image,
for example the eyes, nose, ears and mouths and deem the image of face.
So in summary, computer vision works
by acquiring an image, processing the image and understanding the image.
For more videos on your journey towards mastering AI, be sure to Subscribe to / acadaimy .
Thanks for watching!