C5s stop passing traffic


#1

I have a couple of A5’s with a couple dozen C5’s connected to them, and I have seen this issue 4 times in the recent weeks, and probably a dozen times in the past 6 months.

For legacy reasons I use PPPOE on the client side so in this case the client’s router is PPPOE client through the C5 to the Mikrotik NAS sitting behind a Netnoix switch which is tagging/untagging VLANs and powering the A5 at the tower site.

The A5 is on one VLAN for management (100) and there is another VLAN trunked through the A5 to the C5 (101) for CPE management.

What is happening is that randomly a C5 remains manageable via the wireless interface but it won’t pass PPPOE and it cannot be managed over the Ethernet interface using 169.254.200.20. The Ethernet Interface is up and not flapping.

When the C5 is taken out of service and brought back to the office where it’s not in range of the A5 it can then be managed over the Ethernet interface on the 169 IP (directly plugged into the same POE and Laptop that it wouldn’t talk to when deployed) and after performing a Reset Defaults it can be re-programmed and successfully redeployed. I’ve only just tried this on the 2 most recent cases and they are both back in production now - I wrote the others off as borked :frowning:

The C5’s are properly grounded and the issue randomly affects different C5’s. Rebooting the A5 does not resolve the problem. Replacing the POE does not resolve the problem. All are running the current firmware.


#2

I have not seen this, but I am a heretic who doesn’t run PPPOE or DHCP, and we are only doing a few VLAN deals.

What do the logs say when it goes into Dead PPPOE? (can I coin that for this problem?) Does the C5 become manageable if you un-align it from the sector? (with or without reboot?) Have you tried downgrading firmware? (I guess I should first ask what firmware you are on, but I like skipping steps. But if you are not on the latest firmware what happens when you go to it?)


#3

Heretic - lol I WISH I could not use PPPOE.

Nothing in the logs. This seems to coincide with a power cycle of the C5. Shows Ethernet up 100Full and no traffic.

We’re talking less than 200m to the A5. So the C5’s are still connected when they are in the backseat of the truck.

But the hint on the reset was that I remotely reset one to Factory when it was still deployed and the tech who was plugged in on the test cable to the C5 could then ping it. He configured it up and as soon as it associated to the A5 he could PPPOE. Put it back on the customer’s cable and PPPOE Client and it’s still working. Since that one we’ve brought one back to the office and reset it here and sent it out on a new install and it’s working too.

Yes running the latest firmware on Both A5 anc C5. Haven’t tried downgrading… afraid of bricking it, mind you they seem to be bricking themselves so maybe worth a try…


#4

Promise, the grass is greener in Static IP Land. :wink:

From what you are saying, this should happen for any link and not just ones utilizing PPPOE. Are the routers/computers you are using gig Ethernet ports or just 100M? I am reminded of a known issue that happens if the Ethernet Port on a C5 comes online under heavy traffic: http://client.help.mimosa.co/client-ptmp-firmware-release-notes-c5

Do you use DHCP for the antenna IP or are you static?

This is what I am understanding your network layout to be:
C5 -> A5 -> Netonix -> Mikrotik NAS -> (Internet or close enough)

I am guessing the “Mikrotik NAS” is also your PPPOE server, if not, what and where is it in your network layout?

Also, what is your path to the wireless management interface?

If you are already using Mikrotiks, what happens if you try to “ARP ping” the antenna from each side? (Wireless and Wired)


#5

Some customer routers are gig others are 100Mbps. It’s not under heavy load when it’s just the testers laptop plugged in and still they cannot even ping the stateless IP (169.x.x.x)

Yes the CPE’s get their IP addresses in the Management network from a DHCP server. The DHCP server logs are read by the Helpdesk/CSR’s Management console so that they can single click to get to a customer’s radio.

You have the lay out right. Every tower is the same : AP <> Netonix <> Mikrotik NAS <> Backhaul <> Internet POP

Wireless management interface is on the Management Network which is logically separate from the User
network traffic but uses the same physical transport.

Only the NAS is Mikrotik - the customer Router is whatever they buy. Mostly DLink because that is what we recommend but also Netgear, ASUS and Linksys. It doesn’t seem to matter what the PPPOE client is because even the installer’s Laptops which run the PPPOE Connectoid on them won’t connect.

The lack of any type of Ethernet stats on the C5 makes it very difficult to see traffic but even when I span the A5 switch port into a Packet sniffer I DON’T see the PADI packets to initiate PPPOE session coming through AND the installer can’t even ping the C5 loop back

It’s like the C5 goes into a spanning tree fit and disables traffic on the Ethernet Interface.


#6

Have you tried to ping through the C5 while it is booting up, maybe see if there is a short period of time where it will pass traffic before it freaks out?

Do you have flow control enabled?


#7

Without PPPOE running nothing passes through. I tried setting up a DHCP server on that VLAN and had the Tech setup a DHCP client on his Laptop and there were no requests received at the DHCP server.

I tried with and without Flow Control and we tried hard coding both the C5 Interface speed and duplex. Nothing.

Turns out that a number of the Other C5’s that I thought went to e-cycling in fact got shelved in storage waiting for a enough to make a load. I’m going to pull them out and see if they can be recovered as well.


#8

Have you tried ARP Pinging through? I can ARP ping from both sides of a C5 with a Mikrotik, but it should be something doable with Windows.

Yay! Thank goodness for saving money and a hit of laziness.


#9

Hi ian1,

You are not alone. I’m also facing the same situation as you. Dlink Router -> NID -> C5 -> A5 - Mikrotik router with PPPoE Server -> Internet/Metro-E.

Some time when your PPPoE is dead because the DNS server cannot pass through the C5. If you try to manually fix the DNS in your test PC, it will still connected to Internet via PPPoE. Just don’t know how come the DNS server from router cannot be pass through to C5. I have change the dlink router to use TP-Link. It better but sometime still facing issue until the router and C5 need to reboot.

What we have to do is reboot the C5 and this issue will be temporary resolve until it happen again in next round. I guessing this is the C5 bugs on firmware 2.4.1 or below cause I’m already upgrade to 2.4.1 from 2.3.3 but this issue still happening.

What we can do now is setup watchdog for reboot on C5 under management->watchdog.


#10

This PPPOE issue occurred again today after a power outage. 1 out of 5 C5’s on the same tower would not pass PPPOE. Rebooting it did nothing. Tech on site could not manage it locally on the built in loop back address and it would not pass PPPOE but it was connected to the tower and the Ethernet port was up.

On a hunch I sent the C5 for another reboot and after it disconnected from the A5 I disabled the 2 other VLANs that I am trunking through the A5 and when the C5 came back up it then allowed the customer router to connect to the NAS via PPPOE. I have since re-enabled the 2 other VLAN on the trunk and everything is still working.

Strange stuff.


#11

That is weird, where do the other VLANs go? if the A5 is assigning the C5 to one of those VlANs then that might explain why you cant reach the PPPOE server from the router and why it looks like the Ethernet Port is coming online, but you can’t ping anything through it…


#12

One VLAN is the Management VLAN for all of the C5’s. That VLAN is set on the C5 end and I serve them with Management IP’s via DHCP from the Core. The other VLAN trunks through another C5 to a remote Router as a backup backhaul for a small repeater tower. Both of these VLANS are Tagged.

The PPOE VLAN is untagged at the AP end because the Customer routers may not be able to Tag. It is tagged on the NAS end at the tower site. Depending on the load, a tower site might have up to 16 PPPOE NAS operating on 16 different VLANS with each PPPOE VLAN as untagged to individual APs via the Netonix. This is a fairly small site and only has 4 NAS / PPPOE VLANs

I should have mentioned that the power outage affected the entire town but all of our towers are on generator and battery so they didn’t go down, but all of the C5’s did… and this one woke up very angry ?


#13

I would attempt to ARP Ping at/through the C5 and see what happens, if this is some sort of Bridging issue in the C5 then the ARP Pings should not reach the C5 at least wont go through the C5, but if it is the A5 assigning the incorrect VLAN tag (or none at all) then the ARP Ping should go through at least to the A5 if not the Netonix…


#14

While I was hunting around in the C5 manual, I saw this in one of the updates:

"CPE Data Port Authentication

The C5 can block access through its Ethernet port until 802.1x radius authentication is complete. This allows home routers/gateways with an EAPOL 802.1x supplicant to authenticate and will prevents unmanaged/unsecure devices from gaining access to the network through the C5 Ethernet port."

How do you do your radio unlocks? I don’t have this as an option, but if you have a Radius server this might be a place to look…


#15

I only use Radius for the PPPOE Authorization on the NAS Side. The customer’s Router is the PPPOE Client, the C5 is a pure bridge and the NAS is a Mikrotik router which is usually at the tower site itself.

This just happened yet again with one of the A5’s. A bit different this time in that the customer was getting a very small amount of data through, but packet loss was >50% pinging the NAS from the client’s PC and the packet loss looked like Spanning tree does when it’s “listening, learning, forwarding/blocking, listening”, etc… So you see maybe 8 replies and then 5 timed out and then 5 replies and 8 timed out etc etc

It was Not an RF Problem.

The “fix” is to reboot the client - either via the client GUI, Power cycle or choose “reboot” from the A5 GUI and while its rebooting disable the Client Management VLAN on the connected Switch (for me it’s Netonix or Cisco) to the C5’s. Once the client rejoins you can re-enable the Management VLAN and everyone is happy again.

It would be really nice if Mimosa could fix this feature.


#16

Happened again on the ‘other’ A5 yesterday. I have 3 of them deployed and this routinely affects the 2 that do PPPOE untagged and C5 MgMt Tagged. The other one which passes 4 Tagged VLANS and the MgMt VLAN to the C5’s has not had a problem in 8 months, but these other 2 it’s like once a week at least that someone calls in to complain about slow or no service.

If I leave the C5 Management VLAN disabled on the switch it doesn’t happen. But if I turn it on it’s almost guaranteed that one of the C5’s is going to exhibit this behavior within a few days.


#17

You totes migotes need to make sure Mimosa Support has gotten wind of this and their engineers have had a chance to see the problem/solution. Apparently new firmware is coming down the line so I would prefer that they got the fix for this in this next round. If it wouldn’t take me 2 days to setup, I would verify the issue with my own setup and show the support team, but I have my own dragons breathing down my neck…

Did you ever get to try Pinging MAC addresses when the C5 is down and see what happens?


#18

Does @Support summon ?

Yes no response to Mac Ping.


#19

@DustinS is the closest I know of, but when I want to get to the support team quickly I just go to chat. (Bottom Right of the forum screen “Chat with us”)

These guys are really good, they will probably want SSH to the radios or Teamviewer (I don’t let my radios have internet access so this is the only route I have used.) so that is helpful to setup on a spare computer before hand.


#20

Hi guys,

I do frequent the Community, but I have been traveling a lot lately (I fly out again tomorrow morning). Your best solution would be to follow William’s advice and go to Mimosa Support chat.

From that point, they will get engineering involved and they will most likely ask for access to these radios for SSH so they can dig deeper into the issue you’re experiencing.

Support is available until 7pm PDT time tonight and then opens back up at 11pm PDT tonight. I do recommend coming in tonight and talking with Art so he can get the ball rolling.