Jump to content

Welcome to Geeks to Go - Register now for FREE

Geeks To Go is a helpful hub, where thousands of volunteer geeks quickly serve friendly answers and support. Check out the forums and get free advice from the experts. Register now to gain access to all of our features, it's FREE and only takes one minute. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, post status updates, manage your profile and so much more.

Create Account How it Works
Photo

Web traffic problem. Bot wont go away

bot

  • Please log in to reply

#1
ParseRaider

ParseRaider

    New Member

  • Member
  • Pip
  • 3 posts

Hi...

I have a site that is going to start selling it's products this month. I am frantically building social sites and our website.

I was wondering about site rank and opened an account on Alexa and asked to be certified. I had already removed a lot of worthless bots thru htacess and thru robot.txt.

 

Alexa is supposed to be a legit site and is owned by Amazon another supposed to be good site??????

 

At any rate, I first battled to get the bots into the site as they started making 403 errors to the tune of 6 lines every 5 minutes around the clock. for a total of 33K in 7 days.

 

After the site was certified, I asked for them to go away. As the bot "Alexabot" was their certify bot it's business was done I thought and I needed neither 6 lines every 5 minutes of 302 or 403 errors regardless.

 

ANd then I noticed something really weird.....

 

 

179.61.147.3 - - [26/Feb/2016:17:25:16 -0500] "GET /about-us.html HTTP/1.1" 200 25295 "http://www.google.co...x=&startPage=1""Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727)" 184.72.29.109 - - [26/Feb/2016:17:27:40 -0500] "GET /about-us.html HTTP/1.1" 200 25295 "http://www.google.com/search?q=US+mail+hold+-site%3Ausps.com+-site%3A.gov&tbs=qdr:d&rls=com.microsoft:en-us&ie=UTF-8&oe=UTF-8&startIndex=&startPage=1" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.28) Gecko/20120306 Firefox/3.6.28" 54.177.174.134 - - [26/Feb/2016:17:27:51 -0500] "GET /about-us.html HTTP/1.1" 200 25295 "http://www.google.com/search?q=US+mail+hold+-site%3Ausps.com+-site%3A.gov&tbs=qdr:d&rls=com.microsoft:en-us&ie=UTF-8&oe=UTF-8&startIndex=&startPage=1" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.28) Gecko/20120306 Firefox/3.6.28"


Here are my search results using standard Awstats program:

 

6 different keyphrases Search Percent us mail hold -site usps.com -site .gov 3 33.3 % pickle packer lids 2 22.2 % pickle pushing no 1 11.1 % probiotic jar 1 11.1 % www.ultimatepicklejar.com 1 11.1 % pickle pushing 1

11.1 %

 

AS you can see I am not a great place to visit according to Google or any other search engine for that matter.

 

Here is my question................ For those 3 lines to show in Awstats.... would not a user have to find a link to my site FROM THAT SEARCH PAGE?????

 

Here is the hits basically for the last 10 days.............. More than my legit traffic.

 

403 Forbidden 34,063 81.9 % 18.28 MB 302 Moved temporarily (redirect) 4,034 9.7 % 805.04 KB 404 Document Not Found (hits on favicon excluded) 2,345 5.6 % 19.56 MB 301 Moved permanently (redirect) 922 2.2 % 125.49 KB  

 

I have repeatedly asked Alexa to stop and they do not honor the robot.txt either.

 

ALSO RUNNING A BASIC IP SEARCH FOR THE 3 SITES USING THE GOOGLE SEARCH SHOWED THEM TO BE FROM AMAZON?????

I am not a coder.... My handle is a spoof of Darth Vadar and is a carryover from my gaming days..............

 

179.61.147.3

54.177.174.134

184.72.29.109


Edited by ParseRaider, 28 February 2016 - 11:51 AM.

  • 0

Advertisements


#2
admin

admin

    Founder Geek

  • Administrator
  • 24,540 posts

Hi ParseRaider, and welcome to Geeks to Go! :D

 

I really wouldn't worry about blocking bots. You'll end up pulling your hair out, and the bad bots will just spoof their names and ignore robots.txt anyway. Amazon does have it's own search engine, and indexing spider.

 

Regarding search terms, they are hard to find since Google started using SSL (HTTPS) for searches by default for all logged in users. Sign up for Google's Search Console (formerly webmaster tools). That is the only place they share info on your search term keywords.


  • 0

#3
ParseRaider

ParseRaider

    New Member

  • Topic Starter
  • Member
  • Pip
  • 3 posts

True but 60% of lines from a bot that really isnt trying to acess anything makes searching relavant data hard as I have limit I can view.

And then again the question is how did my site show in that search if that is how search words and Awstats works...

I just weird AND the fact that Alexa will not call off it's dogs once they accomplished the chore they were asked to do.

Thanks for the reply tho and yes bots are a mess.

I do have google analytics. But Awstats is old established data program and why all of a sudden would those 3 lines pop up out of the blue? Stray shots??? Im just curious.


Edited by ParseRaider, 28 February 2016 - 06:50 PM.

  • 0

#4
admin

admin

    Founder Geek

  • Administrator
  • 24,540 posts
80-85 percent of your search traffic will come from Google. Study it, embrace it. Google Analytics, Search Console, will get you 90% of the way there. Ignore the rest.
  • 0

#5
ParseRaider

ParseRaider

    New Member

  • Topic Starter
  • Member
  • Pip
  • 3 posts

I see what you mean..

But at the risk of being a dummy... well being worse than a dummy... I have 20K legit hits and 45K bots of which 14K are JUST the Alexa certify bot........ ANother 5K are the Exabot which rides the shirt tails of the Alexabot as I hat to re allow it in to allow Alexabot using htacess. Whic is weird I thought as I would be blocking "AlFred" By blocking "Fred" which I would have thought would have required a wildcard in the line of code????

 

As for the google traffic..... I got 3 hits on the questionable search term..

and maybe 8 more single phrases in a month and each of them were relavant and got a hit  each.....

 

I would post my awstats page if you wanted to see and I will change it later as it is not password protected anyway. I just dont want you thinking I was trying to get free publicity....


  • 0

#6
admin

admin

    Founder Geek

  • Administrator
  • 24,540 posts
Also, remember if this is a new domain, Google will "sandbox" it for 4-6 months. Over which time it will slowly increase search engine result positioning.
  • 0






Similar Topics

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

As Featured On:

Microsoft Yahoo BBC MSN PC Magazine Washington Post HP