Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5

How to prevent people scraping your website ?

#1
Is there anyone have good tips how to protect my site keep away from people scraping ?
HOT MUSIC VIDEO | Video Music Knowledgebase
#2
we can't protect, only make hard to scrapped.
compare http header of your site with normal browser and with scraper, you will find some different.
use that different header to make your site hard to scrap.

improve your self. or you can pm me, i will tell you cause you are my brother :)

if we share here, anybody will know which http header. and if known by scrap maker, all we have to do is for nothing :)

but maybe your site also lost some visitor that have abnormal browser :)
ytdl.xyz youtube downloader
[-] The following 1 user Likes ogah's post:
  • jaran
#3
@ogah
Ah yes. You're right. Why I forgot it.
HOT MUSIC VIDEO | Video Music Knowledgebase
#4
The only SURE way to protect your site from scraping is to require a user login to view any content. This is why I require registration on my sites to view most information other than the portal page and a few select post/page excerpts.
#5
(05-01-2014, 08:27 PM)Zephyron Wrote: The only SURE way to protect your site from scraping is to require a user login to view any content. This is why I require registration on my sites to view most information other than the portal page and a few select post/page excerpts.


Yes, that is what I do with my sites-- require registration on everything except my portal page(s). In some sections with sensitive data, I require not only a registration, but require that the person wishing to see the sensitive data be approved by an administrator prior to membership acceptance. For forums, I also require email validation AND capchta to eliminate spambots altogether. Seems to work quite well.
#6
(05-01-2014, 08:27 PM)Zephyron Wrote: The only SURE way to protect your site from scraping is to require a user login to view any content. This is why I require registration on my sites to view most information other than the portal page and a few select post/page excerpts.
its still scrapable. scraper can pass the login cookies in their script :)
ytdl.xyz youtube downloader
#7
(05-02-2014, 01:55 AM)ogah Wrote: its still scrapable. scraper can pass the login cookies in their script :)

Not necessarily true. Not all logins cause a persistant BROWSER cookie. It depends on your script. If you have a script that bases your cookie on your IP that is written to your database (if you use one) upon member registration, then scrapers have a heck of time getting in if their IP is not already in that particular database. Think about that for a moment or two....

Now this can be a bit of a problem if we talk about static vs. dynamic IPs for your members/customers; while a static will remain the same, if your script is well written, you can detect if the given dynamic IP is within an acceptable range for the serving domain from a previous login.

Browser cookies are too easy to hack. Use a database solution......
#8
a common trick is to load your headers, footers, etc as usual, and then use ajax to call the content (onLoad)...

This creates a delay in the content getting there, so a 'regular' scraper will just fetch your page & (probably) not the content...

Not perfect, but will defeat less sophisticated scrapers...
[-] The following 1 user Likes CSense's post:
  • jaran
#9
(05-10-2014, 10:52 PM)CSense Wrote: a common trick is to load your headers, footers, etc as usual, and then use ajax to call the content (onLoad)...

This creates a delay in the content getting there, so a 'regular' scraper will just fetch your page & (probably) not the content...

Not perfect, but will defeat less sophisticated scrapers...
Good idea. But as far i know if visitor still will know what exactly our ajax url on http GET. I dont know if it build with http POST. I've never using ajax because Im still learn for it.
HOT MUSIC VIDEO | Video Music Knowledgebase
#10
(05-11-2014, 06:11 AM)jaran Wrote:
(05-10-2014, 10:52 PM)CSense Wrote: a common trick is to load your headers, footers, etc as usual, and then use ajax to call the content (onLoad)...

This creates a delay in the content getting there, so a 'regular' scraper will just fetch your page & (probably) not the content...

Not perfect, but will defeat less sophisticated scrapers...
Good idea. But as far i know if visitor still will know what exactly our ajax url on http GET. I dont know if it build with http POST. I've never using ajax because Im still learn for it.

I never (ok, very rarely) use GET with AJAX, for exactly the reason you gave -- too much is exposed for malicious folk to view...

Have onLoad = 'getContent();' -- which calls a JS routine (in a separate js file, so the routine itself is not visible) which uses AJAX to call a 'getContent.php' script -- which uses pre-defined SESSION vars or passed POST vars to determine what to get --and you get your content inserted into whatever div you specified as the return element...

...and all the user can see is that you're calling a JS script on load -- nothing else is revealed...

HTH,

CSense
#11
If you are using apache web server you may add the -Indexes option within your configuration file.

<directory>
   Options -Indexes
</directory>


Next time sombody tries to go into directory it will display "forbidden"
  




Users browsing this thread:
1 Guest(s)

How to prevent people scraping your website ?451