Re: Associative memory for navigation
- From: Chad Johnson <chad.d.johnson.work@xxxxxxxxx>
- Date: Fri, 07 Sep 2007 04:51:41 -0000
On Sep 6, 11:30 pm, c...@xxxxxxxx (Curt Welch) wrote:
Chad Johnson <chad.d.johnson.w...@xxxxxxxxx> wrote:
If you try to train the robot by example, you have to give it a lot of
examples for it to work. For example, if you point at the fridge and
tell it, that's the refrigerator, what's going to stop it from thinking
that all large white areas are to be known as "the refrigerator"? Or
what if there's a magnet on the fridge and the robot makes the
assumption the magnet is the fridge? So when you say go to the fridge,
the robot runs over to the white-board which also has magnets on it.
Before we can understand such a message, we need to create some parsing
of the environment into objects and have a large base of commonsense
experience about what the person is most likely to be trying to
communicate to us.
The basic concept of association is very simple and powerful and a
fundamental part of what makes humans intelligent. But the hard part
of the problem is understanding how to decode raw sensor data into
"things" that the associations can be made with. I don't know of any
AI projects that has really solved that part of the problem.
When we, as humans, look at a kitchen, we don't just see raw 2D pixel
data. We see a 3D room full of 3D objects. Before you can use basic
high level associations like telling the robot the big white box is a
refrigerator, the robot first has to decode that raw 2D data into a
description of a 3D room full of objects so that frig you understand,
is the one the robot already understands before you try to give it a
name.
Isn't being able to distinguish objects only going to be an issue if
an objects' location changes or if the object is moving? What is the
disadvantage if an object is identified based on the image and radar
readings of it and its surrounding environment? It's less human-like,
yes, but would the robot not still be able to locate the object?
My point was more to do with the simple idea that the robot wouldn't be
able to understand your question. It wouldn't be able to learn what object
you were talking about if it didn't have a concept of objects that were
close to your concept of objects.
On a different note, about the distinguishing objects from one another
and the environment: in my room I have a bookcase, and to the right I
have two computers stacked on top of one another. How do I know that
the computers are not part of the bookshelf -- that I have 3 separate
objects? Some things I notice are (and I am sort of just thinking to
myself here):
* The bookshelf area is colored differently than the computer area
* The bookshelf does not have buttons or lights or internal shapes
like the bookshelf does; the computer has dinstinctly different
features than the bookshelf area does.
Now, how I know there are *two* computers in the computer area of the
image data rather than just one? The computers look different, but
suppose they looked exactly the same and were aligned perfectly so
that they looked like one object. I would have a more difficult time
realizing that there are two computers in that area and not just one.
It would take me a little longer to make this determination, and I
think there is the chance that I may not realize that there actually
are two objects. So to make this determination, I think the biggest
factor would be that I would realize that I am seeing the same or
similar thing twice, and with the number two in my mind, I would
likely poll any existing knowledge about computer cases, and, assuming
I had any, I would likely determine that the height of the area is too
tall to be just one computer. So basically I'm doing a size comparison
against my existing computer casing-related knowledge and determining
whether any cases I've seen have ever been that tall. If the results
are around 50/50, I may inspect the cases closer (e.g. try separating
the two repeated areas).
It seems that distinguishing whether an area in an image is one or
multiple objects involves closeness matching. I very much bet this
could be done with some algorithm. Maybe one that could separate the
image data into multiple areas based on various hard-coded (or even
learned) characteristics, such as size, color, shape, shadows,
internal characteristics (shapes, colors) etc. Then other non-image
inputs could be used as complements, such as physical dimensions from
radar.
Any thoughts? :)
You just can't know that there are two computers there simply by looking at
the image. How, for example, do you know that those two computers stacked
on top of each other are not actually a foam sculpture carefully shaped and
painted to look exactly like two computers stacked together?
When you look at those computers, your brain doesn't parse it as a foam
sculpture sitting next to a bookcase. Your brain parses it as two
computers, which means you can lift the top one, and the bottom one won't
move, and that they have an expected weight which is much heaver than a
foam sculpture.
Part of your ability to see those as two computers might, for you, come
from the fact that you put them them there in the first place. But even if
I walked into your office, knowing nothing about it, I too would probably
see them as two computers, and not a single foam sculpture. This happens
because of my long experience interacting with similar environments. Every
time I've seen something like that in the past, and then interacted with
it, I found it had typical computer-like properties.
So the job of parsing an image into objects, is not something you can do
very accurately without a long history of interacting with similar
environments in the past. The job of parsing the image into objects, is a
job of statistical probability, based on past experience.
If you attempt to hard-code some computer algorithm to parse interior
scenes accurately into book cases, and books, and stacks of computers, you
will likely find that when that algorithm moves outside into a forest,
almost nothing was parsed correctly. You would have to make all sorts of
additions to the algorithm for it to even get close to correctly parsing
trees and shadows and leaves, and dogs with spots, and patches of snow on
the ground.
So, I think the key to getting a robot to understand sensory data like we
understand it, is developing adaptive decoding algorithms that tune
themselves based on experience.
In addition, a huge advantage we have is that we process temporal data. We
extract our information not from static data (like a photograph), but from
how the sensory data changes over time. Our expectations are based on
probabilistic predictions of how sensory data is likely to change over time
because of how the sensory data has changed in the past.
For example, when we see a book with a title printed on it, we see the book
as one item, instead of seeing the cover as one item, and the title as
being a different item. Part of why we can do this is our expectations of
how this image of the book is likely to change over time. If the book
moves to the right, we expect the title to move to the right at the same
time. If the title moves, we expect the book to move. We see it as one
object, because we expect the book, and the title, to have temporally
correlated motions. We expect them to change in similar ways, over time.
This is not something any algorithm could know without exposure to many
books in the past. How does it know the title is printed on the book, and
not just paper which is cut out and laying on top of the book? How does it
know the paper title might not blow off in a second with a slight gust of
wind? It knows it because everything it's seen in the past which looked
similar to this book, makes the sensory data decoding system predict that
the most likely event is for the title to move when the rest of the
features of the book moves, so as a result, all those different visual
features get parsed as "one object". They are temporally predictive of
each other and that's what (I believe) causes the sensory data processing
system to associate them as being "one object" in the effective parse tree
of the pixel data.
Yea, I think past experience is definitely a critical factor. Thinking
about it more, I guess if I looked at the computers the first time (no
machine/computer experience) it would take me a while to figure out
there are two of these objects.
I'll have to keep thinking about these things. Thanks for being open
to discussing them with me though -- it would have taken me a lot of
time to come close to even considering many of the things you
mentioned.
I don't know how to actually implement this concept in hardware, but it's
the type of concept I believe we need to look at to understand how humans
deal with sensory data and how we can "understand" sensory data in terms of
objects.
--
Curt Welch http://CurtWelch.Com/
c...@xxxxxxxx http://NewsReader.Com/
.
- References:
- Associative memory for navigation
- From: chad . d . johnson
- Re: Associative memory for navigation
- From: Curt Welch
- Re: Associative memory for navigation
- From: Chad Johnson
- Re: Associative memory for navigation
- From: Curt Welch
- Re: Associative memory for navigation
- From: Chad Johnson
- Re: Associative memory for navigation
- From: Curt Welch
- Associative memory for navigation
- Prev by Date: Re: Associative memory for navigation
- Next by Date: Re: Associative memory for navigation
- Previous by thread: Re: Associative memory for navigation
- Next by thread: Re: Associative memory for navigation
- Index(es):