Radios stop responding every day at same time for 1 hour


#1

In the past three days, both of my B5’s stop responding to pings and SMTP requests at exactly 6:15am MDT for exactly one hour (or a reboot). They continue to pass traffic normally and they continue to talk to Mimosa cloud.

No other radios (all non-mimosa) in our network have any issues at this same time. We monitor internally and the management network is operating fine. I can confirm not able to ping either radio from multiple locations on the network.

The fact this has happened at the same exact time and last exactly one hour tells me software issues. We are running firmware 1.3.1

Anyone else every seen this?


#2

Hi Chadwick,

We’re not seeing the data on the Mimosa cloud. Would you please send in the support logs? Go to Diagnostics > Logs > Support Info > click the blue Support Files button to download a .tar file from each radio to your local computer, and then email the files to support@mimosa.co.


#3

I can confirm that this issue occurs for us as well, however, I assume you mean SNMP, as that is the issue we experience.


#4

Hi Chris,

Chad’s problem symptom was not specific to SNMP. There is another radio on the same channel that is always there but becomes more active at that time of day, reducing SNR and causing a bottleneck. There is also wide power variance between channels which could reduce the effectiveness of automatic gain control. We suggested a few channel and power changes to troubleshoot.

RF problems at certain times of day are not uncommon, which is why it is useful to look at spectrum data, PER, and errors over time on the cloud. Evening hours are common for video streaming, for example. Sometimes, the extra traffic is related to scheduled backups especially in commercial situations.

When SNR drops, the capacity is reduced, and lower priority traffic suffers the most.


#5

Very interesting, as this happens at some of our collocated sites. I will investigate to see if we are experiencing SNR issues. I just assumed it was something software related, as it happens for 1 hour on the nose, and the radio seems to pass traffic normally.

Thanks!


#6

Hi Chris,

If you point us to the specific link, we can take a look at cloud data and logs.


#7

My problem does seem specific to SNMP and ping response. At exactly the same time, the radio stops responding to SNMP and Ping requests for exactly one hour. I can’t believe that is related to SNR issues with another radio. Traffic continues to pass during the SNMP outage. Both ends of the radio stop responding and the AP end is connected directly to a switch with our monitoring computer. If it was a capacity issue, that would not explain the AP end, just the client end.

The times the interference increases do not line up with the SNMP outage exactly. When the capacity is reduced, it still has plenty of headroom. Our links are very quiet at the time of day this happens.

I was convinced this was software but the new beta does the same exact thing.


#8

If the device does not respond to ping, then no management traffic of any kind will work. This is a different issue, so will dig through the logs and see if we can learn what the problem might be.

Chris, at what time of day do you observe the problem?


#9

From Original Post…


#10

Hi, Just wondering if there was any resolution to this? We are seeing the exact same behavior with a pair of B5’s. Not sure if the times line up. Our time zone is AEDT and the Mimosas stop pinging at approx 0915 every morning. Traffic is traversing the network no problem, however both B5’s stop responding to a ping from our network monitoring software for exactly 1 hour at the same time every day. We are not using SNMP.


#11

What are you using to monitor your network? I think I have tracked this down to a PRTG problem and not an issue with the B5’s. While it ONLY stops seeing Mimosa radios at the same time for exactly one hour, it does not appear to be Mimosa’s fault. When we go to our PRTG machine and command line a ping request to the Mimosa, it comes back failed - so the entire PRTG computer stops seeing Mimosa radios. Every other device on our network can ping the Mimosa just fine except the PRTG computer.

I’m honestly at a loss and gave up trying to fix it. It does it for 30 days after a reboot and then starts working normally until we re-boot that Windows PC again.


#12

Hi Chadwick, Thanks for the feedback. Yes we are using PRTG, so I suspect you might be onto something here. I was having a similiar experience yesterday when I could ping the B5s from other machines but not the PRTG monitoring server.

I might try configuring a SNMP trap on the B5 and monitoring it with SNMP and see if it is any better and if this problem still occurs.

Thanks for the feedback and prompt response.


#13

Here’s where it get’s really odd. Everything from PRTG to the Mimosa fails, ping, snmp, http, etc. It’s like the Mimosa radios fall off the network for exactly one hour and only from the PRTG machine. I got PRTG involved, sent them logs, ran some tests and they could only come up with my Windows machine was the problem. But it only happens when I update or re-start the PRTG server.

If you figure this out, let me know. It’s been 32 days since my last restart of PRTG so it’s back to working for me again. I won’t be annoyed until we upgrade PRTG again.


#14

I have just completed some packet captures on the PRTG machine when the problem is occurring.

At this point I am leaning towards the problem been with the B5’s. Two observations that lead me to this are:

  1. I can see ICMP packets constantly generated from PRTG in wireshark, no response, then they just start working at the end of one hour. Nothing changes on the PRTG box.

  1. The second observation is that the B5’s dont go offline at exactly the same time, it is a couple of minutes apart. Given they get their time from GPS etc its obviously not time related. I note that they go offline and come back up in the same order that they were last rebooted in.

If I get some time over the next week or two I will port mirror and packet capture on the switch the Mimosa is plugged into and see if pings packets are getting to the B5.

We are running firmware 1.4.1 BTW.


#15

I am running PRTG on a Windows 7 Professional machine. The Mimosa radios are all running 1.4.1 but this happened with the two previous firmware versions as well.

I might agree that they go “down” within a minute of each other, not at exactly the same time. We ping radios every 60 seconds so it is not possible to tell if they stop at the same second.

What I can’t explain is when they stop responding to PRTG, they are still very responsive to all other devices on the network and the Mimosa cloud. It’s just PRTG that can’t see them.


#16

Resolved:

Hi Chadwick, thanks for your pointer that this was a PRTG problem and not a Mimosa issue. I have now resolved this problem and you are quite correct, it is a PRTG issue not a Mimosa issue.

It seems there is a bug in PRTG where by if you adjust the ‘Ping delay in MS’ field, as I had done to allow for some of the latency over the Mimosa link, you get this issue where communication to the node is cut off.

To resolve I paused my current sensors, created new Ping sensors with all defaults. Left it to run for two days. No problems. If you un pause the problem sensor with the delay configured, it will stop all other sensors.

Hope this helps you out in getting your monitoring working as well.


#17

I have a customer with B5-Lite having the same problem with pings and SMTP not responding once an hour ruffly using The Dude to monitor. Firmware 1.4.4


#18

I finally gave up on this. We use PRTG to monitor our entire network. For a while I thought my PRTG Windows 7 machine was the problem. When PRTG lost access to the Mimosa radios, the whole Windows 7 machine lost access to them (could not ping from command line, could not get web page to open up for any Mimosa radio).

I then installed a second PRTG install on a Windows 10 machine on a different physical part of our network. Guess what?? After two days, all Mimosa radios stopped responding to SNMP requests from THAT server as well for exactly one hour. Again, can’t ping Mimosa radios or access them via HTTP from that Windows machine.

So, I now have two different computers on two different parts of our LAN that both can’t get at Mimosa hardware for exactly one hour each - but at different times of the day. When they can’t talk to Mimosa radios, they are the ONLY devices on the network that can’t access them.

It really looks like the Mimosa radios have a firewall or something in the software that shuts off access from a certain IP every 24 hours for 60 minutes. It’s like it thinks it is seeing an attack and stops access. I can prove it is not my hardware and not my network with the problem. What also makes me think this is in the Mimosa software is the time of day it shuts off access directly correlates to the last reboot time of the monitoring machine. So, if our PRTG server was re-booted at 3am, at 3am two days later, Mimosa radios stop allowing access from that PRTG machine for 60 minutes.

It’s hard to explain in words but I really think there is something in the Mimosa software that makes it “protect” the radio by black listing an IP for 60 minutes.


#19

Could it be a “Cron” process that is running on the B5 ? Some of them are 24 hr cron routines! (Once every 24 hours). It might take an hour for it to complete? In the meantime, it shuts off some services?


#20

We don’t see this on our PRTG machines.

But, we have outdated Mimosa firmwares, and our PRTG is old as dirt also LOL