|
YvesDaoust wrote: for approximate Levenshtein distance computation
Approximate isn't good enough.
|
|
|
|
|
Hi, i'am a researcher in computer vision system (Electronics Engineer by profession) designing a system capable of out performing the current state of the art vision system. OpenCV 2.2 did not impress me, vision by machines seems 2 lag behind the simplest animal u can think of (like a cat or something else). i think computers are powerful enough 2 handle vision nearly as good as humans. Why are the state of art vision systems very task specific and not as robust as they should be?any suggestions?
|
|
|
|
|
Let's see your system and then we can judge.
|
|
|
|
|
Well first i have 2 deal with patent issues plus i'am writing a journal on it, i'have written a proprietary vision library and will be ready 2 show my system 2 the world when all the legal issues are done and when i finalise the design. these legal issues make innovation very difficult
|
|
|
|
|
BCDXBOX360 wrote: i'am designing a system capable of out performing the current state of the art vision system.
BCDXBOX360 wrote: these legal issues make innovation very difficult
You seem to have two conflicting statements here.
|
|
|
|
|
Richard MacCutchan wrote: You seem to have two conflicting statements here.
maybe i was supposed to write that ,legal processes of getting patents and other rights to an invention discourages innovation but does not make it impossible.
|
|
|
|
|
I rather meant that, having claimed that you were going to create a state of the art system that would beat anything currently available, you are now saying that you cannot do it because of the difficulty of getting a patent. That sounds like an excuse not a reason.
|
|
|
|
|
Richard MacCutchan wrote: I rather meant that, having claimed that you were going to create a state of the
art system that would beat anything currently available, you are now saying that
you cannot do it because of the difficulty of getting a patent. That sounds like
an excuse not a reason.
okey lets forget about the patent issues for now, i was initually worried about ideas being stolen, but right now as i write this i'am sitting in front of a lap-top with a vision-library (designed and coded by me) capable of out-performing the current state of the art vision systems. (i'am just optimizing the library and doing some final toughes)
|
|
|
|
|
BCDXBOX360 wrote: i'am sitting in front of a lap-top with a vision-library (designed and coded by me) capable of out-performing the current state of the art vision systems.
In that case I'll go back to my first comment: "Let us see it in action and then we can judge.".
|
|
|
|
|
Richard MacCutchan wrote: In that case I'll go back to my first comment: "Let us see it in action and then
we can judge.".
okey i will make a video presentation as soon as i finish optimizing the library
|
|
|
|
|
BCDXBOX360 wrote: Why are the state of art vision systems very task specific and not as robust as they should be?
Looks like you haven't even started your prestigious project.
You would have seen the fact that with "real" items there are no perfect matches to stored representations. Any animal is capable of recognizing a tree from the data its eyes send to its brain. Write a software which will detect that tree in a bitmap. And then, in a picture of the same tree taken from a different place, and recognize that that's the same tree...
|
|
|
|
|
I was tempted to say that but thought I would give OP the benefit of the doubt.
|
|
|
|
|
yup image recognition is half good dsp and half black magic still
|
|
|
|
|
well i wanted other views from the codeproject community on computer/machine vision algorithm limitations, i have been researching on the current developments in vision systems for 2 years now and have been iteratively refining my design over time based on new and promising heuristics of vision.
|
|
|
|
|
Bernhard Hiller wrote: Write a software which will detect that tree in a bitmap. And then, in a picture
of the same tree taken from a different place, and recognize that that's the
same tree...
it was found that neurons called view-tuned-units exists in animal/human brains that encode only one view of a given object(in this case a tree) and these feed into a view - invariant unit. the principal design criterion for my vision system is based on that same principal, but the secret is to encode those views in time and space (memory) efficient algorithm.simalar to an algorithm by
S. Hinterstoisser, V. Lepetit, S. Ilic, P. Fua, and N. Navab, “Dominant orientation templates for real-time detection of texture-less objects,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
they used different views of the same object encoded in a very compact and efficient way but their method works for texture-less objects but is efficient even for a very large database of objects
|
|
|
|
|
This is in response to all your posts.
I gather that you haven't started or are in early stages of your project. I'm also working on some vision recognition stuff but I'm pretty far along and I can tell you you'll find a lot more complications than you realize as you go. That's the reason why a lot of systems are domain specific. It allows them to take advantage of certain known facts and "cheat" so to speak since no one has created a general purpose system yet. In addition to the difficulty that one posters already mentioned here are just a few of the other things you need to consider:
1) Defining the edge of objects: Most objects in the real world will have areas where the edges are blurred rather than sharp color changes. Look up canny edge detection and it will explain some of this stuff.
2) Recognizing 2 areas are part of the same object: Consider a cat with black and white patches. How is a vision system supposed to know that 2 areas with radically different colors are part of the same object.
3) Depth Perception: If you use 2 cameras similar to our 2 eyes you can match 2 objects and then compare the parallax shift. However, this only works at certain distances. Our brains probably only use this at short distances, several other methods are used at long distances where the parallax shift isn't large enough to judge.
Also why are you worried about patents at this stage? I doubt you are going to get sued for simply experimenting with something. If your system does end up working and you want to commericialize it then buy/license the rights from the existing patent holders that are in your way. In addition you may find your idea changes a lot as you work on it and run into difficulties, it did with me.
|
|
|
|
|
mikemarquard wrote: 1) Defining the edge of objects: Most objects in the real world will have areas
where the edges are blurred rather than sharp color changes. Look up canny edge
detection and it will explain some of this stuff. 2) Recognizing 2
areas are part of the same object: Consider a cat with black and white patches.
How is a vision system supposed to know that 2 areas with radically different
colors are part of the same object. 3) Depth Perception: If you use
2 cameras similar to our 2 eyes you can match 2 objects and then compare the
parallax shift. However, this only works at certain distances. Our brains
probably only use this at short distances, several other methods are used at
long distances where the parallax shift isn't large enough to judge.
1) I would agree that my ideas will change in time because they already have, but for the better, at first i started off trying edge detection methods but later on realised that edge detection is not necessary, descriptors such as SIFT,SURF,DOT,HOG and many more use orientation and not contours. This is supported by biological vision in simple and complex cells, my system follows this trend. orientation is not affected by blurring thus more robust and descriptive.
2) My system uses local image patches and a part based recognition infrastructure without segmentation since segmentation is a by-product of recognition then the vision system is not supposed to segment out scenes or potential objects before recognizing them.
3) My system is not currently designed to use stereo cameras it uses a single camera and does not need depth or capturing a 3D representation to aid recognition.
my project as evolved in actual sense and i'am using my on vision library to implement the system and i have figured out how to encode image data in an efficient and robust manner for building a generic object recognition system. How do i know that it will work?well i have been progressively testing simple building blocks of the system and now i'am certain that this will work when the whole system is put together. i am optimizing my vision library for the final implementation and probably months remaining before completion.
|
|
|
|
|
It sounds like your ideas and my ideas are a lot different. Actually my ideas ideas are a lot different than any of the ideas I've read about and my image segmentation is technically not an edge detection algorithm either. I wish you the best of luck and if you have some big successes I'd love to hear about it.
BCDXBOX360 wrote: 3) My system is not currently designed to use stereo cameras it uses a single camera and does not need depth or capturing a 3D representation to aid recognition.
You might have more limited aims than I do but I have to question this one. If you're trying to build something that is capable of doing what a human or animal can do I don't see how this can work cause clearly humans and animals see in 3d. Also if you choose this path keep in mind objects look radically different from different views. Without some sort of 3d perception it is going to be difficult to get the system to recognize multiple views as being part of the same object.
|
|
|
|
|
mikemarquard wrote: objects look radically different from different views. Without some sort of 3d
perception it is going to be difficult to get the system to recognize multiple
views as being part of the same object.
that's why my system uses a multi-view representation as i explained earlier,during the learning phase
multi-views of the same object are learned and efficiently encoded for fast retrieval, this is supported in biological vision, neurons called view-tuned-units can only respond to a single view of a given 3d object but a collection of them gives a view-invariant behaviour. my system also implements a knowledge transfer technique for one-short learning (this reduces training sets as the system learns more and more things, just like humans!) and animals/humans can see effectively with a single eye proving that depth adds very little information(maybe little enough to be ignored for now). we see in what i call false 3d (it's only out of experience with this world that enables the brain to encode multi-views of various objects and cheats us that we see in 3d) the truth of the matter is that we see in 2d representation especially for recognition purposes. i think depth is used to tell how far the recognized object is from your eyes more accurately but this information is not used in actual recognition of the object.
|
|
|
|
|
BCDXBOX360 wrote: my system also implements a knowledge transfer technique for one-short learning (this reduces training sets as the system learns more and more things, just like humans!) and animals/humans can see effectively with a single eye proving that depth adds very little information(maybe little enough to be ignored for now)
Actually its been found that if a person is born blind in one eye they never develop proper depth perception. The reason you and I can see in 3D if we cover an eye is because as children we learned other ques for judging depth. However, we needed 2 eyes to learn these ques because without them we have very little information to accurately gauge where an object is at and thus know how other ques corresponds to a particular location.
I would say this; I don't know of any animal that has only 1 eye so I think depth perception must be important and parallax shift or I think the proper term is stereopsis is important.
PS If you were the person who downvoted me I'm not trying to be critical or discourage you, I just enjoy debating these topics with people of similar interest and hearing their opinions.
|
|
|
|
|
mikemarquard wrote: Actually its been found that if a person is born blind in one eye they never
develop proper depth perception
But they can recognize objects effectively,right? my main interest is recognition, the question is how much does depth perception affect recognition of objects? well i do know of 3d face recognition being more accurate than the 2d counterparts but this requires 3d sensing putting a strain on cpu's. and what about all the 2d images and videos available, how will your 3d system make use of them?
mikemarquard wrote: The reason you and I can see in 3D if we cover an eye is because as children we
learned other ques for judging depth
Thank you, because thats my solution, my system learns those ques during the learning phase by presenting multi-view training sets like i wrote earlier.
mikemarquard wrote: PS If you were t\he person who downvoted me I'm not trying to be critical or
discourage you, I just enjoy debating these topics with people of similar
interest and hearing their opinions.
Don't worry i'm not like that, i also enjoy discussing with people of similar interest, besides there are'nt many daredevil's to go down this path, it's gud to always hear from people like you. i wish you luck in your endeavor.
modified 20-Oct-11 20:34pm.
|
|
|
|
|
BCDXBOX360 wrote: But they can recognize objects effectively,right? my main interest is recognition, the question is how much does depth perception affect recognition of objects? well i do know of 3d face recognition being more accurate than the 2d counterparts but this requires 3d sensing putting a strain on cpu's. and what about all the 2d images and videos available, how will your 3d system make use of them?
I would assume yes because I've never seen anything written on that subject. So you are probably right that it is possible, but I still suspected it will learn faster with 3D.
BCDXBOX360 wrote: mikemarquard wrote: The reason you and I can see in 3D if we cover an eye is because as children we
learned other ques for judging depth
Thank you, because thats my solution, my system learns those ques during the learning phase by presenting multi-view training sets like i wrote earlier.
Yeah but that's the point I was making earlier. Without some preexisting method for judging distances you have nothing to use as a measuring stick when your system would learn those cues. That's why people born blind in one eye don't learn those cues, because they cannot use stereopsis as a measuring stick. They have no way to see how things like for instance the size of an object corresponds with it's distance from the viewer because they never actually know how far the object is from them.
BCDXBOX360 wrote: Don't worry i'm not like that, i also enjoy discussing with people of similar interest, besides there are'nt many daredevil's to go down this path, it's gud to always hear from people like you. i wish you luck in your endeavor.
Thanks
|
|
|
|
|
mikemarquard wrote: So you are probably right that it is possible, but I still suspected it will
learn faster with 3D.
According to wikipedia "Stereopsis appears to be processed in the visual cortex in binocular cells having receptive fields in different horizontal positions in the two eyes. Such a cell is active only when its preferred stimulus is in the correct position in the left eye and in the correct position in the right eye, making it a disparity detector." you are right 3d vision is useful, i will consider using two cameras,but i will start with a single camera then move to 3d implementation this will enable my system to take advantage of both worlds, thanks for the advice. I thought through this and realised that i left stereopsis out but now i have considered using it, i have found that there is room for it in my vision system, i do'nt have to modify the whole library but just add additional functions to support stereopsis, thanks again for stereopsis.
|
|
|
|
|
|
My answer will be simple: artificial intelligence is still nowhere.
|
|
|
|