Jump to content

Welcome to Geeks to Go - Register now for FREE

Need help with your computer or device? Want to learn new tech skills? You're in the right place!
Geeks to Go is a friendly community of tech experts who can solve any problem you have. Just create a free account and post your question. Our volunteers will reply quickly and guide you through the steps. Don't let tech troubles stop you. Join Geeks to Go now and get the support you need!

How it Works Create Account
Photo

Auto-Downloads Program - Is it possible?


  • Please log in to reply

#1
DaniD

DaniD

    New Member

  • Member
  • Pip
  • 1 posts
Hey everyone,

I've been wondering for a couple days now, is it possible to write a simple piece of code for a program that allows one to automatically download new content posted on a website.

A bunch of friends and I were discussing the possibility of a program or code that would automatically download any new content our lecturers post on our student portals for university and I've hit a roadblock. I dont know where to look or where to go from here. Thought of writing the code myself, but I'm totally useless with coding. Wouldn't know where to start.

If anyone has any ideas, info or can point me in the right direction, that would be highly appreciated.
  • 0

Advertisements


#2
MS-Free

MS-Free

    Member

  • Member
  • PipPipPip
  • 425 posts
I'm sure its possible. How much work will end up being required to pull the new content, will depend, to a fair degree, about how the content is posted/laid out. Also a consideration: Which protocol are you grabbing the content via. Can you get it from FTP, or do you have to do it strictly over HTTP.

I can see in my mind's eye how you might go about doing something like this. (Being a Linux user, I'm envisioning something involving wget, but I'm not sure how well that would actually work.)

The other thing you need to think about that adds further complication to things is sense you're pulling this from a student portal, how are you going to work authentication?

Sounds like a fun project... especially if you're of a scripting nature.
  • 0

#3
W-Unit

W-Unit

    Member

  • Member
  • PipPipPip
  • 170 posts
It is indeed possible, however authentication to a student portal is going to be no easy task.
I have written many programs which gather information from websites, all using HTTP. Here's what I would do:
1. Connect to the website, authenticate, and retrieve the source code for the page(s) you're interested in.
2. Compute and record (to the hard drive) the MD5 hash (or any other hash of reasonable bitlength; MD5 is just a handy example) of the page's source. If it differs from the hash it has previously recorded, then continue, otherwise you know there have been no changes to the page(s) since the program last ran, so you can halt execution and just output "No updates" or whatever.
3. Use a regexp to glean the information which is of interest to you from the page's source. If part of the information is in a bulletin-board style, where some entries are old, and some new, then you'll need to set up a hash table structure to keep track of which information has already been reported by the program. I would do this by writing a hash algorithm, and computing and storing hashes of all entries as they are gleaned in a hash table, which gets imported from the hard drive each time the program runs. Only information whose hash does not map to any value in the hash table should be considered "new." This is the most efficient method, but for it you'll have to understand hash tables, and write that hashing algorithm - if you don't know how to do that, you can use some other method (probably just linear searching) to check information against the database of "old" information in order to determine whether the data in question is "new." This will be have significant performance costs in comparison to a hash table, and personally it always drives me crazy to know I could've written a substantially more efficient program, but perhaps this is satisfactory for you.
4. Output all the "new" information via your method of choice.

Like I said, the kicker here is going to be authenticating. There are many different types of authentication, and universities, being large institutions with easy access to lots of money and expertise, tend to have some of the most complicated, and least documented, systems in place in the interest of security. You're not doing anything illegal of course, but you'll have to be something of an expert, I suspect, to figure out how to go about authenticating, and people may be suspicious of your trying to figure this out. If you want to post the website you're trying to access here (and it's understandable if you don't), perhaps someone could tell you how to go about authenticating, but I myself likely wouldn't be too hopeful.
If you do manage to understand how your portal's authentication system works, everything else is within my personal knowledge and I'd be happy to help with any other parts you don't understand, as I'm sure many other folks here would as well.

Cheers and best of luck :)

Edited by W-Unit, 07 June 2011 - 01:06 AM.

  • 0






Similar Topics

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

As Featured On:

Microsoft Yahoo BBC MSN PC Magazine Washington Post HP