crawl scans collecting hreflang urls between stores

So my site uses a multiple store setup and most of the stores were setup using a subfolder including store.tld as my default store and store.tld/en-ca and store.tld/en-gb and store.tld/en-au etc as subfolders. This is actually my full structure.

  1. store.tld
    1. store.tld/en-ca
    2. store.tld/en-gb
    3. store.tld/en-au
    4. store.tld/en-in
    5. store.tld/ ETC
  2. store2.tld
  3. store3.tld

My hreflang for store.tld are simple. store.tld is x-default. All others are for their own href language.

When I scan let’s say store.tld it only crawls urls containing store.tld only but if I scan store.tld/en-ca it will show all urls from store.tld/en-ca but also show a ton (maybe all) store.tld urls too! Why is this happening?

  1. I’m guessing it is crawling the x-default store.tld domain in the hreflang tag of the store.tld/en-ca domain. So it’s indexing STORE A to B but does it back to B? More at #3.
  2. Why is it not crawling any domains from /en-gn or /en-au? I mean if it was reading the hreflang tags why not read all the tags. Is it because it only focuses on x-default?
  3. Does any of this matter as far as SEO and site structure? I mean if googlebot crawls a url like store.tld/en-ca/* looking for /en-ca/ content and it somehow crawls onto the store.tld x-default domain won’t the en-ca hreflang tag on those pages tell googlebot that store.tld urls are not for en-ca and it should get back over to store.tld/en-ca urls instead?

Thanks for any advice. This is bugging me. Again if I scan a storeview that uses a domain like store3.tld or store2.tld this is not happening because store2.tld and store3.tld are not in the hreflang tags of store.tld site but a subfolder that is ended from store.tld is a different story. I am assume a crawl tool sees a subfolder as nothing more than a page or an extension of the site and so maybe that is how it is treating it?

My site does use MageWorx SEO Suite if that helps.

submitted by /u/kassius79
[link] [comments]