Archive

Archive for May, 2010

Insert LaTeX math formula into Microsoft Powerpoint

IguanaTex is a freeware that help you to insert math formula, equations using LaTeX into Powerpoint presentation. I don’t know exactly which version of MS office it would work for, but at least it works for MS Office 2007. If you are familiar with LaTeX, then this will definitely help you to write math equations in Powerpoint real fast.

IguanaTex
http://www.technion.ac.il/~zvikabh/software/iguanatex/

There are a lot more in this forum:
http://www.physicsforums.com/showthread.php?t=93189

Advertisements

Python: ‘\r’ shows up in readlines() Ubuntu

Anybody knows?

I use Python, run on Windows7, to write a list into a text file say list.txt. Then I use Python, run on Ubuntu 9.1, to read (readlines) the text file into a list. I found that everywhere I put newline (\n) becomes (\r\n) in the list read by Python. I don’t know what happens here?

Categories: iDea Tags:

Python code for Retrieving URLs and Tags from Del.icio.us

Goal: To retrieve URLs and tags

Here we will show you the format of the output file. There are 3 output files: (1) urls.txt, (2)users.txt and (3)tags.txt.

  • In the text file urls.txt each line is a url, for example
http://sneerwell.blogspot.com/2009/10/top-9-best-free-web-hosting-for.html
http://sethgodin.typepad.com/seths_blog/2008/12/lesson-learned.html
http://weblogs.variety.com/season_pass/2008/10/mad-men-qa.html
  • In users.txt, each line is the Delicious username who tagged the retrieved urls. For example,
filipesouza
Vera Legisa
jeroenbijnens
anishmisty
213db
newmediabias
tiagomarques
Marcel van der Laan
genetjin
  • In tags.txt each line contains tags given to its corresponding url by several users. Each tag given by the same user is separated by space ( ) and individual users are separated by semicolon (;). For example, in the corresponding tags below, the 3rd line reads as there are 3 users tagged the 3rd url. The first user tags (madmen tv interview television); the second tag (MadMen Variety); the third tags (madmen). One line represents tags for one url.
web hosting ;erika ;best blogging hosting reference webhosting wordpress list ;
blog internet marketing technology creativity ;Career Innovation Business ;
madmen tv interview television ;MadMen Variety ;madmen ;

How we retrieve the information from Del.icio.us?

This code is developed based on DeliciousAPI by Michael G. Noll, for more detail please refer to Michael’s website. Details about installing the DeliciousAPI is given in my previous post. The way we retrieve the information is as following:

  1. Retrieve 100 URLs from Del.icio.us hotlist
  2. Retrieve all the usernames who tagged the URLs in the list
  3. For each username, retrieve all the URLs tagged by that username and add the URLs to the URL list
  4. Repeat step 2 and 3 until the number of URLs in the list is reached

Download the file here.

How to use the code

There are 2 Python (.py) files to use:

  1. retrieveURLsandUsers.py: Iteratively retrieve urls and corresponding users until the desired number of url is reached. This program will give you a list of (5000) urls in the file urls.txt.
  2. retrieveTags.py: Read each url from the list in urls.txt, and output corresponding tags for each url.

In the next section we will show how to use the program to run in different situations.

Run for the first time

1. Open retrieveURLsandUsers.py and input the approximate number of URLs you want to retrieve, for example if you want to retrieve 5000 urls, then you will say

number_URLs_desired = 5000

2. Open the file url_retrieve_log.txt which is a configuration file and input

1
0
0
0
0
0
0

3. Now you can run retrieveURLsandUsers.py.

Your IP blocked by Del.icio.us!!!

There are some chances that your IP will be blocked by the web server. You will have to accept it, and wait a couple hours until you can retrieve information from the web site again.

Do I have to run from the beginning again next time? No we can start from where we were blocked. When the website blocks our IP, all variables are stored in text files, all status variables are stored in url_retrieve_log.txt. For example

1       processID (1 or 2) that we will start with
334     url index that failed
177     url_list_start
350     url_list_end
23      user index that failed
12      user_list_start
350     user_list_end

Therefore as long as you don’t mess with the file, then you can always continue retrieving the information from the website.

In other words, when the web site raise your IP, then you will just run the program retrieveURLsandUsers.py, and it will do everything for you.

Once I got 5000 URLs already, what next?

Next, you will want to retrieve corresponding tags for each url you retrieved.

1. Open the log file tag_retrieve_log.txt, and if you run this program for the first time, you will see the following in the file.

whatever
0

2. Just leave it like that

3. Run the file retrieveTags.py and see the result.

4. The code will run smoothly for about 120 urls, then the message will show “[#] Oh boy…Your IP was blocked again by Del.icio.us”. Nothing to do but waiting for a couple of hours then run the same again.

5. The log file will record where your IP was blocked in order to start from the right place. For example, if you failed to retrieve the 401st url, the log will show

http://www.iwit.nl/
401

401 is the index of the url that you failed to retrieve, the first line is the corresponding url. Leave it alone, don’t mess with it.