|
|
Recently I have to work with AGV controller to integrate it with .NET application via TCP/IP Socket. Factory is using it to control robots on the factory floor. The only issue is factory is working at 100% capacity (by the way it produces water heaters ) and there is only one controller available to use when one of the production line is not using it. Despite the fact that everything worked OK when I have tested it and during UAT. There are strange issues happening and only way to test those was to put my untested code in production environment. Finally I was able to figure out that Controller is giving up connection once in a while failing to recognize message from .NET or other way around. I still have no idea why it is doing it but at least I was able to provide solution to reconnect and resend. Deploying untested code in the production was scary.
What would you do when there is no way to replicate situations happening in production environment ?
Zen and the art of software maintenance : rm -rf *
Math is like love : a simple idea but it can get complicated.
|
|
|
|
|
virang_21 wrote: What would you do when there is no way to replicate situations happening in production environment ? Pray.
(Seriously though, I sympathize that you had to put untested code into prod.)
/ravi
|
|
|
|
|
I'm in a similar situation WRT a BI tool that's too expensive for us to buy a second license for a dev server; the best option I've found it to keep a supply of medicinal scotch and bourbon at home.
OTOH we at least can schedule our test deployments around when no-one needs the server to be available; which doesn't appear to be the case in your problem.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, waging all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies.
-- Sarah Hoyt
|
|
|
|
|
|
|
It loads now.
|
|
|
|
|
|
This problem has been *already* been solved.
Many times over.
Since computers were invented.
By NASA, by the Military, by safety critical industries.
Seek education in their solutions.
|
|
|
|
|
Michael Kingsford Gray wrote: By NASA, by the Military, by safety critical industries.
Yep, when something blows up, they patch it.
To alcohol! The cause of, and solution to, all of life's problems - Homer Simpson
----
Our heads are round so our thoughts can change direction - Francis Picabia
|
|
|
|
|
If only you had their budget...
They will never have seen anything like us them there. - M. Spirito
|
|
|
|
|
I've done that more times than I'd like to admit...
Strange error occurring in production, test on my machine and everything runs fine.
Just make some fix that -could- solve the problem and ask the user to test it.
We don't have any specialized test environment or anything. So if it doesn't work on my computer, or that of a colleague, well, we're forced to test in production.
Actually, only our two biggest customers test our changes in a test environment.
Our other customers can't be bothered by setting up test environments (or have us do it).
So when we've done the 'developer testing' we can put the code in production and pray we've programmed what the customer wanted
Luckily this approach works for us and our customers most of the time (we always keep backups of previous versions in case something breaks).
It's an OO world.
public class SanderRossel : Lazy<Person>
{
public void DoWork()
{
throw new NotSupportedException();
}
}
|
|
|
|
|
Add a LOT of logging to see what is happening.
Always assume connections are not open and test them before attempting any communication.
|
|
|
|
|
Add lots of logging until you are sure you understand what the real problem is (although sometimes you have to take your best guess or add code to fix the symptoms without finding the root cause in a truly complicated system).
I worked on an automated conveyor system at one time. One customer kept having their box counts off and the shipping company was so aggravated they threatened not to ship for the company any more. The programmer tried lots of things to fix the "faulty sensors". The problem was finally solved when he was at the company late one night and saw a bored employee playing with the blinking lights. The employee had no idea it was throwing the shipping counts off.
Production always seems to come up with some combination of circumstances you don't think of.
|
|
|
|
|
Member 8824288 wrote: Production always seems to come up with some combination of circumstances you don't think of. Don't they ever! I could probably share a ton of horror stories with you since I used to program conveyors as well.
Of course the biggest problem we always had was what I called the U shaped communication channel. Worker would note a problem, mention it to his supervisor, who'd mention to his boss, until the company I worked for was called about it, and then it eventually filtered down to me. (AKA the Telephone Game)
What would start as a simple problem would turn into "The conveyor is on fire, running backwards."
We once thought we were clever hiding a "Dump All" report behind "Ctrl-A", never expecting the operators to rest with their palms pressed down on the keyboard covering CTRL and A and the terminal autorepeating requests for reports that would print for a half hour each. We'd had to reset the computer because we had not added a "delete print job" to our report queue, although even if we had, there would have been hundreds to delete.
Psychosis at 10
Film at 11
Those who do not remember the past, are doomed to repeat it.
Those who do not remember the past, cannot build upon it.
|
|
|
|
|
Like others have said, plenty of logging always helps
Including unit/integration tests in your solution can also help you to simulate various error conditions and allows you to design your code to handle these scenarios.
modified 31-Aug-21 21:01pm.
|
|
|
|
|
I work in a similar environment where there is one "machine" fully automated manufacturing process with robots, vacuum chambers heaters, gauges, etc. At times I cannot get to this equipment because it is under test and this keeps me from testing my code base. I finally bit the bullet and created a number of virtual machines which are deployed onto other computers. These vMachines will use the identical messaging system TCP/IP, RS232, RS485 (same infrastructure used in production). The nice thing about my virtual machines is that they can be setup to fail for specific reasons. Also the virtual machines do not need to produce the full list of features, just the areas that you need to test.
|
|
|
|
|
I used to program computer controlled conveyors and there is only one device to use, the production conveyor, they build only one.
Initially the programmers were too worried about processor headroom and would only put static menus on the screens, all the while saying tests they had run showed the conveyor was only consuming 10% of the processor. (12 MHz days)
My programming buddy and I started putting status displays on the screens that displayed every internal number that we thought would be useful, even though some screamed by on the screen too fast to be of any use unless the program crashed and we could then get a hint of what was going on by the displayed numbers.
Once we got that to work, I advanced the art by building screens with graphical displays (IBM text graphics) that would be in the shape of the conveyor and would show photoeyes being blocked and diverters being fired. I turned one of the unreadable numbers, the index into the internal pseudo belt that we tracked the product through into a foot per minute display. We once had a conveyor failing because for some reason the motor speed was wrong and that became a handy way to verify speed without having to put a manual tach against the hub of the motor.
I also displayed numerical messages that came from the PLC by expanding the message into English on a scrollable subsection of the status display.
We were running DOS 3.3 and used the multitasker the FORTH language vendor had built into their implementation. We had 10 tasks running and occasionally a task would die and we wouldn't know which one. So I modified the multitasker to have a flag, when set, that would display the name of the current task in control on the 25th line of the CRT. So the name of the offending task would be displayed when it died. I also modified the multitasker from being round robin to support priorities by adding a parameter that said how many times around the task loop it should wait before executing instead of immediately surrendering control. That let me put the status display to a lower priority than say, the communications handler.
So in summary, start adding status displays to your controller system that can help you narrow down failure points.
It also helps to have a buddy working with you. Since these systems were real time, we couldn't single step the code. We ended up sounding like doctors comparing symptoms. Maybe it's "X", but if it's "X", we should be seeing "Y", and we're not, so it's got to be "Z", and we'd eventually narrow the reasons for failure. If it became indeterminate, we added status displays to help us sort it out.
Good luck, Mr. Phelps.
Psychosis at 10
Film at 11
Those who do not remember the past, are doomed to repeat it.
Those who do not remember the past, cannot build upon it.
modified 23-Jun-14 13:57pm.
|
|
|
|
|
I was writing fuel management software package, and had to write a virtual fuel pump to test with, based off the service manual.
I need an app that will automatically deliver a new BBBBBBBBaBB (beautiful blonde bimbo brandishing bountiful bobbing bare breasts and bodacious butt) every day.
John Simmons / outlaw programmer
|
|
|
|
|
virang_21 wrote: What would you do when there is no way to replicate situations happening in production environment ?
In normal server work there are ways to help with the situation.
1- Add logging (over time one would just use the log output.)
2- Build a simulator (over time it gets better.)
3- Unit testing.
4- Very rigorous design (at implementation level) and rigorous code reviews.
The fourth is often to time consuming to apply to all code and probably too boring for most developers. Although the later would seem to be just something that developers must do the reality is that humans will tend to get glassy eyed if one attempts to force this for all code. Just the way humans work. But eye balling a very small but very critical subset can help.
The last one can do as an independent developer but one is still subject the to the boredom factor as well. At a minimum I find for myself that I only attempt such detail reviews of my own code by waiting a day. If I attempt it immediately I don't really see the code.
3 and 4 can be used together. However whether applicable depends on specifics of the system.
|
|
|
|
|
jschell wrote: In normal server work there are ways to help with the situation.
1- Add logging (over time one would just use the log output.)
2- Build a simulator (over time it gets better.)
3- Unit testing.
4- Very rigorous design (at implementation level) and rigorous code reviews.
1. That is how I was able to figure out what is going on. Logging every condition and messages sent and received and all the variable values when it fails
2.The controller I am using is from a company in Germany... Manual ? what is that ? .. Some old flowchart is what I had to program it with...
3. Unit testing/ BA Testing / UAT did not pick up those errors because if you produce few items it works like a charm but when it is constantly being used and one of the production line needs a response time of max 1.5 second it gets complex with full factory using the application on same network and controller is connected to network via WiFi....
4. Code Reviews ? Who will do that when you are the only developer ....I have to put a hand on my heart and tell them that it is not the code that is failing but something else is causing it... I have to deal with PM who keeps on insisting that it must be your code that is wrong... No technical help just keep looking at the flow chart and keep insisting you must be doing something wrong...
Zen and the art of software maintenance : rm -rf *
Math is like love : a simple idea but it can get complicated.
|
|
|
|
|
virang_21 wrote: 4. Code Reviews ? Who will do that when you are the only developer
As I suggested - do it yourself. Actually I have often found that I must do that myself with critical code even when there are other developers because they just won't take it serious. Walking through my own code, just the critical pieces, not immediately but after a day allows me to verify logic if I do it in detail.
|
|
|
|
|
I've never had any issue with doing that when I needed to. Just be careful.
virang_21 wrote: one controller available to use when one of the production line is not using
You actually have it pretty good. I have had jobs where there was only one of the system in the company (and there was no way they would buy or build another simply for development and test).
It's one of the things that separates the men from the boys.
You'll never get very far if all you do is follow instructions.
|
|
|
|
|
I feel for her, and can only imagine how good she'll feel if I help take it[^] off her hands.
|
|
|
|
|
If he isn't yet Ex-Husband, there is a risk she is selling it without his permission, therefore you are handling stolen goods (if you win)...
|
|
|
|