Jun 01

This blog post comes about a year late, but who cares. This particular project is very important to me, as it was really fun, and my first steps to python/django and sysadmin tasks like LDAP, SVN/Git and friends. The project was written along with my friend Cephalon (Γιάννης Σπανός) under the guidance of our professor Χρήστος Σωμαράς. It is still available at cronos.teilar.gr, but you won’t be able to sign up unless you are a student at TEI of Larissa.

The Problem

My school had an irrational number of websites, providing crucial announcements. It is very hard to follow all of them (the knowledge gets lost when it is spread around), not to mention the unavailability of some during most of the weekends :P Actually, that belongs to the past I hope, I haven’t noticed anything recently, but in the past the problem was really obvious. Every one or two weekends a power cut was hapenning, even in exams periods. The web services were too many though, some students were not even aware of some (for example the career.teilar.gr website has info about job positions, grad school programs etc). Another very important problem is that most of those websites require different accounts, and I couldn’t even imagine how many students didn’t remember their credentials, and did a password reset every new semester (remember, we are not talking only about CS students here, actually they are the minority, and they are not an exception to that unfortunatelly). Last but not least, none of the mentioned websites provided RSS feeds, making it even harder to track them (some of the major ones do now though).

Brainstorming

So, what would be really useful is a web application that will combine all those announcements, in a syndicator-like page. Plus, a number of other information (like grades, the semester’s declaration, their student mail account, even a list of teachers) could be also included to that app. And, most importantly, we need a unified RSS feed for all those announcements, since some (like me) prefer desktop applications to read their news. Well, we don’t need announcements of every school or every teacher though, the student should have the ability to select what they want to view. Since the uni websites don’t provide RSS feeds, we’ll have to parse the html pages instead, and store the output in a db. Some of them require authorization first, so we’d need the user’s credentials, which should be encrypted with blowfish in order for the system to be able to reuse them when needed. Everything looks good so far, let’s move on:

Implementation

First step was to set up the server. We got a new box and needed to do everything from scrach. We installed and configured OpenLDAP, MySQL, Django with mod_python, Subversion, phpLDAPadmin, phpMyAdmin, WebSVN, and the apache vhost files. Afterwards, we started messing with pycurl and BeautifulSoup. It took us some days, but we finally had two python scripts (django standalone files actually). The first was parsing the names of the schools, the teachers, the lessons as well as the names of the web services from which we were going to collect announcements. The second was actually parsing the announcements, of about 10 websites, which were put in a database. As I said there were no RSS feeds, so we had to send thousands http requests with pycurl and parse with beautifulsoup. I know, very unreliable, but there was no better way we could think of, plus it worked just fine. Some of the websites:

  • www.teilar.gr: The most important website, it contains the announcements of all the teachers, school specific announcements, even college and various other announcements.
  • e-class.teilar.gr: Equally important. e-class is a very popular webapp among greek universities, which offers tools for better communication between teachers and students. Apart from announcements, it also provides the ability to upload and store files, like presentation files or weekly projects, a calendar, a mini-chat, contact form and others. Many teachers prefer that to announce things instead of the first one.
  • dionysos.teilar.gr: This one has the accounts of the students. This is where the student sees his grades, declares classes for the semester, etc. School wide announcements are also provided.
  • myweb.teilar.gr: webmail service (no announcements provided here obviously :P )
  • some informational only websites: LinuxTeam, PR, Library, Career, NOC and others (still adding new ones actually).

The above python scripts were put in cron.d, the first runs weekly and the second hourly. It takes about half an hour for the second to complete, and in the old box (a very old celeron laptop used as server) it took 45-50 mins. After having all that data in the db, it is now time to create the signup/login systems, and then print those data to the users. The login system took a while to get ready. We played a lot with LDAP, python-ldap and some django authentication backends to fully understand everything. We used the library ldap_groups, which pretty much had everything we wanted, although we had to tweak it a bit. In short, django allows the usage of another authorization backend instead of its ModelBackend, by adding the following in settings.py:

 
AUTHENTICATION_BACKENS = (
    'path.to.custom.backend',
    'django.contrib.auth.backends.ModelBackend',
)

When a user logs in, the custom backend (assuming it is an LDAP backend) searches for the user in the django DB first, and if not found, it searches in the LDAP server afterwards. If the user is found in the LDAP server, its data will be transfered to the django DB. With that system, the data will stay in the LDAP server (and will be used easily) even if the DB gets wiped out. Similarly, the signup system was searching for duplicate entries only in the LDAP server. The signup got a bit more complex though. The system is asking for the accounts of dionysos, e-class (optional) and webmail (optional), and in order to verify them, it uses them immediatelly, performs a pycurl and parses the output. If successfull, it also parses the student’s data, like his name, semester, registration number, grades, e-class lessons he is subscribed, and put them in the ldap server. Using the django syndicator module we were able to create a unified RSS feed of all those announcements. The pages we created were the following:

  • The first page which prints some personal information
  • A syndicator-like page with all the announcements
  • An e-class specific page, which shows the lessons a student follows, recently uploaded files, and pending deadlines for projects
  • An e-mail page, which only shows the mails
  • A library page, where the user can get results for books through the library.teilar.gr web page
  • A list of all the teachers, with the school they belong and their emails
  • An about page
  • A settings page, where the student can change his password, his credentials of the other websites, update his grades/declaration/e-class lessons list, and select the teacher/other announcements he wants to follow

The future

Our target was to merge the LDAP server we created with the one the school uses. But due to some changes the NOC did, it was impossible to do that, so I had to drop the LDAP support in our application as well, and use the DB only.

After getting it online, two other guys created a similar django project as their thesis project, which would offer online registration to labs. The original plan was to merge those projects, due to many similarities in the code (especially regarding the registration system, which was also using the same LDAP authorization), but went on a separate web app finally, diogenis.teilar.gr (by Στέφανος Χρούσης and Γιώργος Τσιώκος).

About a month ago, the two services were moved to a new box, who was put under LinuxTeam control, and features two VMs (more on this on separate blog post). We have a long todo list now, and people that have interest in contributing. The service is still online, I hope it will stay for a while, thanks to Δημήτρης Παπαπούλιος and Άλεξ-Π. Νάτσιος for taking care of the server and the VMs, Γιώργος Κούτσικος for the artwork and Γιώργος Τσιώκος for the design! Let’s hope it will stay online for much longer, and more thesis projects on top of this will follow.

Jan 05

Hello, first actual post in linux planets :)

Introduction

Recently I wanted to have an svn+ssh installation, without giving ssh access to the users. The procedure is very simple, and it doesn’t diverge much from the typical subversion configuration. This is going to be the first topic I’m going to expand. The second one is a Gorg installation, that takes advantage of it, and helps translation teams. Let me clear this out, by telling you the whole story of it:

Some Greek geeks recently gathered and wanted to create a greek gentoo community. So, the very first thing I wanted to do is to have better communication between translators, but also give a motivation to some people to contribute to the translations, even with just reviewing. The current model does not allow it that much. There is only one or two people that have CVS access to translations (for greek there is none at the moment). So, if there are other people also translating the documentation, they have to send patches, which has many drawbacks: What if two or more people where working at the same thing? What if someone finds a simple typo? Why should he create a patch? If a bunch of people could just correct those kind of small mistakes in documentations, without getting in the procedure of creating patches or whatever, the translation progress would be very rapid, and it could be easy for more people to contribute. Let’s begin with the subversion configuration.

Subversion

In fact, all I have done here is to collect information. No special tweaking. This is going to be rather a quick installation, configuration and usage howto.

First of all, we install subversion :P (in Gentoo it is dev-util/subversion). Then we create a svn user and group, setting its home folder to /var/svn. This is the place where our subversion repositories will be stored.

useradd -m -d /var/svn -s /bin/bash svn

(For some extra security I set rbash as this user’s shell, it seems to work but you’ll have to make sure it doesn’t break your hooks first). The following changes should be done with the svn user, in order to avoid permissions problems. So, we go to that directory and create a test repository:

svnadmin create test

Next step is to set up the users accounts. We need ssh keys from the users for this. In /var/svn/.ssh/authorized_keys we write the following:

command="svnserve -t -r /var/svn --tunnel-user=commiter1",no-port-forwarding,no-agent-forwarding,no-X11-forwarding,no-pty ssh-dss AAAAB3Nza.... user@host

Now, a little explanation about the above snippet. When the user performs an svn+ssh command, he actually logs in to the system with the svn user. So, ssh immediately calls the command svnserve, with some extra parameters, like the path of the folder where the repos reside and the actual username of the commiter (here it is commiter1). After that follow some extra ssh options which provide more security (like preventing execution of X11 forwarding). At the end is the user’s public key. The last step is to configure the repository, and specify who has read/write, who has read-only and who hasn’t any access. We have to edit two files, test/conf/authz and test/conf/svnserve.conf. In svnserve.conf we uncomment the following lines:

anon-access = read # optional: only if we want an anonymous svnserve running
auth-access = write
authz-db = authz

Then we edit authz file, where we specify the users’ privileges. There are many ways to do it, by specifying aliases, groups of people, and extra permissions to subdirectories. There are examples inside the file, so I am not going to expand on it at all. I’m just going to show a very simple configuration:

[test:/]
commiter1 = rw
commiter2 = rw
commiter3 = r
* = r
# Note: Of course the wildcard * = r covers the commiter3 = r entry

With this configuration we don’t need a running svnserve daemon. This prevents anonymous checkouts, and allows us to close the svn port (default 3690).

However if you do want this, in Gentoo the file /etc/conf.d/svnserve is used to specify the user that will run the daemon, which should be the user svn. Also, the SVNSERVE_OPTS variable could contain the repos path ( –root=/var/svn ). Debian does not provide a script, but it is very easy to create a custom one, and a simple google search will provide millions. Contact me if you need more info on this.

The last part is to create an svnserve wrapper. gentoo-wiki provides one that I have extended a bit (based on a script that robbat2 gave me that he uses in a Gentoo server):

#!/bin/bash
export SSH_SESSION=1
echo "$(date),$(date +%s) $USER (${SSH_ORIGINAL_COMMAND}) ($@)" >> /var/log/svnserve.log 
if [ "$SSH_ORIGINAL_COMMAND" == "svnserve -t" ] ; then
    export SSH_LOGIN=
    umask 002
    exec /usr/bin/svnserve "$@"
else
    exit 1;
fi

The extra thing I added is to log the user and reject the connection if the ssh argument is not svnserve -t. Place the script in /usr/local/bin/svnserve, make it executable and make sure that "which svnserve" returns /usr/loca/bin/svnserve instead of /usr/bin/svnserve. According to the gentoo-wiki article:

If the latter is the case, SSH does not search /usr/local/bin for the svnserve command. To change that, you can use the PAM module pam_env.so which is usually included in /etc/pam.d/ssh via system-auth. pam_env's config file is /etc/security/pam_env.conf and by adding PATH OVERRIDE=/usr/local/bin:/usr/bin:/bin you instruct it to set this particular path for all system-auth services. It appears this PAM module affects local login commands also, so check you have all the directories normally included in root's PATH included in this /etc/security/pam_env.conf entry.

That covers pretty much everything needed for the installation. The following command show how we can checkout the repository:

svn co svn+ssh://svn@server/repo

Since we specified -r /var/svn in authorized_keys, we don't need to type the whole path (the same applies to anonymous checkouts). Note that we use the svn user to do the svn+ssh authentication. The commits though will be logged with the actual user, the one who was specified in --tunnel-user. An optional step is to install a web gui for our svn repositories, like websvn or viewVC. Their configuration is very easy and well documented, so I won't expand on this. We are done with the subversion configuration, now let's move to something more Gentoo-specific.

Gorg with SVN

The main problem in Gentoo translations (and documentation translations in general I suppose) is that they can't be handled with a transifex or pooptle installation. So, what I am proposing here is to have a Gorg installation that serves a copy of the gentoo.org website, apart from the /doc/XX folder which will be a separate svn repository, which will be updated after every commit with a simple post-commit hook. The whole thing seems to work very well for the greek language (/doc/el), and you can see a sample of my work in the following links: http://gorg.gentoo-el.org/doc/el/handbook and http://websvn.gentoo-el.org/listing.php?repname=gentoo-doc-el. (Note to other translation teams: if you are interested in this but you don't have a server to host it, I'll be glad to host it). The installation of gorg is fully explained in Xavier's website, and I don't think I have to say anything more on this. After it is up and running, we can go on doing some further tweaking on this.

First of all we create an svn repository (for example called gentoo-doc-xx). Then delete the folder doc-xx from the CVS checkout we did earlier. Replace this folder with the svn repository you just made, which has to have the same name:

svn co svn://localhost/gentoo-doc-xx /path/to/your/document/root/doc/xx

Then we set up the hook. Some templates are stored inside the repositories, in the subfolder hooks. Just create a file post-commit, make it executable and add to it the following two lines:

#!/bin/bash
/usr/bin/svn export file:///var/svn/gentoo-doc-el/ /path/to/your/document/root/doc/xx  --force >> /var/log/svnserve.log

And that's pretty much it. Feel free to contact me for any suggestions or questions. The docs I used for this are the following: