Learning Python

Our way of living has totally changed over the last years. Now a days almost everything relies somehow on computers, e.g. a clothing shop uses some kind of software to keep track of their clients. Since this is getting more popular there is a stronger need for people who are able to create such software. Because of that I am going to show you how and where you can learn about creating software, i.e. programming. In this case we will use Python as our programming language.
Why Python?
Python is an relatively new programming language compared to C and other languages. Since it is a more modern programming language it is easier to use and does not require that much in-depth understanding. Further it is good to start with because you can see, at least partly, the results of what you programmed immediately.

To get started download the latest version of Python and install it. If you are having problems with setting it up visit the links below or post your problem here and I will get back to you.
Learning Python
To be able to use a programming language you first need to get a basic understanding. For that reason I suggest you read this Python tutorial. If you want to get even more in-depth knowledge feel free to take a look at this article. After having read about Python start to program a little bit. Don’t be frustrated if your program doesn’t work right away, that’s how you actually learn how to program. We learn through mistakes.

Feel free to contact me if you have any questions

Bookmark and Share

Advertisements

Python – Finding Broken Links In An HTML File

I recently wanted to be able to check broken links in an html file and did not want to buy any commercial programs. So I wrote a program in python which shows you the broken links in an html file. Here is the code:

#!/usr/bin/env python

import os, sys

def usage():

    print “usage: %s <html file>” % sys.argv[0]
    print “checks the html file for broken links”

def fileExists(file):

    • inf = os.stat(file)
      return True
      return False
  • try:except OSError:

def extractLink(line, tag):

    index = line.find(tag)+len(tag)+1
    end = line.find(“\””, index+1)
    link = line[index:end]
    return link

def getDirectory(file):

    • index = file.find(“/”, index+1)
  • index = 0
    while file.find(“/”, index+1) > index:directory = file[:index]
    return directory

######################
# the main program starts here #
######################
if len(sys.argv) < 2:

    usage()
    sys.exit()

file = sys.argv[1]
text = open(file, “r”).readlines()
linklist = []
tag = “href=”
#extract the links from the text
for line in text:

      • linklist.append(link)
    • link = extractLink(line, tag)
      if not “\”” in link and not “‘” in link:

  • if tag in line:

if file.startswith(“/”):

    directory = os.path.abspath(getDirectory(file))

else:

    directory = os.path.abspath(getDirectory(os.getcwd()+”/”+file))

if not directory.endswith(“/”):

    directory = directory+”/”

print “-“*30
print “missing file(s): ”
print “-“*30
for link in linklist:

      • print link
    • fl = link
      val = fileExists(fl)
      if not val:

      • print link
    • fl = directory+link
      val = fileExists(fl)
      if not val:

  • if link.startswith(“/”):

    elif not link.startswith(“http://&#8221;):

print “-“*30

There are a lot of possibilities to improve this program, e.g. you could provide the line on which the broken link is etc. Feel free to post any improvements or comments, I hope this is helpful.

Bash/Python – Extracting Links From An HTML File

I recently came across the need to extract links from an html file. Of course I wanted to automate the whole procedure. There is an easy way of doing this using the bash shell.

cat file | grep "=href" | cut -d"/" -f3

This gives you some ugly links so you can improve it by grepping the domain name.

cat file | grep "=href" | cut -d"/" -f3 | grep domain

Of course domain is the domain name which should be included in the links.

Since I still wasn’t satisfied I wrote a little program in python which does exactly the same thing as the commands mentioned above. Here is the code:

#!/usr/bin/env python

import sys

def usage():

print “usage: %s <file> ” % sys.argv[0]
print “prints all the links contained in that file”

def extractLink(line, tag):

index = line.find(tag)+len(tag)+1
end = line.find(“\””, index+1)
link = line[index:end]
return link

if len(sys.argv) < 2:

usage()
sys.exit()

file = sys.argv[1]
text = open(file, “r”).readlines()
linklist = []
tag = “href=”
for line in text:

if tag in line:

link = extractLink(line, tag)
if not “\”” in link and not “‘” in link:

print link

This seems to be a lot of code but it actually isn’t considering that it was written in a higher level language. If you want to use this code you have to align it properly.

Feel free to post some improvements.