March 2008 Archives
There are a lot of Linux distributions. There are some that I do not like because of their contents. There is other two that I do not like because of their name:
- Fedora is pronounced by me as fédóra. Not sure what is the correct sound for it, but if I read it as a Portuguese word (thus, with the tonic syllable being the second), it remembers me of the Portuguese word "fedor". Translation for "fedor" is something as bad smell.
- Ubuntu is pronounced by me as ubuntú. Why? Because if I pronounce it as a Portuguese word, it remembers me of the Portuguese word 'unto', that means fat and oily meat from the pig.
Probably you know. Probably you don't. But I'll tell you. Corpora is the Latin word for the plural of Corpus. And Corpus is the word for body. And in Natural Language Processing, Corpus (or Body) is a set of collected texts that share some property (like language, probably genre, probably subject).
Major Natural Language Processing tasks are based on Corpora. So, a Corpora mailing list was created some time ago (I would say more than 6 years).
I am a subscriber for corpora mailing list on ubi.no, and received a mail I must quote. It was written by Dr. DJ Hatch, and says:
Now, what is the first thing that passed my mind when I read the first line? Crap! Junk on the mailing list. But I was mistaken as this is a serious message discussing gender differences in language. I wonder if you would guess from that extract.
Now, sorry to Dr. DJ Hatch if he do not like this published here. If so, please mail me from the same email used to post the message on the list and I will remove this entry.
Major Natural Language Processing tasks are based on Corpora. So, a Corpora mailing list was created some time ago (I would say more than 6 years).
I am a subscriber for corpora mailing list on ubi.no, and received a mail I must quote. It was written by Dr. DJ Hatch, and says:
Actually, the genitalia comparison sucks. As far as I'm aware there growth is purely down to biological factors. (No learning can possibly be involved.)
Now, what is the first thing that passed my mind when I read the first line? Crap! Junk on the mailing list. But I was mistaken as this is a serious message discussing gender differences in language. I wonder if you would guess from that extract.
Now, sorry to Dr. DJ Hatch if he do not like this published here. If so, please mail me from the same email used to post the message on the list and I will remove this entry.
In the continuation of my last post, here are some more details. I am sure this is not a problem of my C code. It might be a problem in the way I am compiling it, but as it is not complaining, I have no clues. I am quitting my efforts to port Lingua::Jspell to windows.
The compiler is MingW, the one that comes with Strawberry Perl. The Windows is a XP professional running under VMware.
This is the file I want to read:
Here is my program, if you feel like you want to look into C code.
The compiler is MingW, the one that comes with Strawberry Perl. The Windows is a XP professional running under VMware.
This is the file I want to read:
C:>dir port.hash
Volume in drive C has no label.
Volume Serial Number is 64D6-6505
Directory of C:\
03/22/2008 04:59 PM 2,020,994 port.hash
1 File(s) 2,020,994 bytes
0 Dir(s) 17,291,145,216 bytes free
This is me compiling it, as usual:
C:\>gcc -o teste.exe -Wall teste.cAnd this is me running it:
C:\>teste.exe Read 124/4640Interesting is that this same program returns different number of read bytes for different files (all bigger than 4K).
Here is my program, if you feel like you want to look into C code.
#include <stdio.h>#include <fcntl.h> #include <string.h> #include <stdlib.h> int main(void) { char filename[100]; void *buffer; int fd, bytes; buffer = malloc(50000); strcpy(filename, "port.hash"); fd = open(filename, 0); bytes = read(fd, buffer, 4640); fprintf(stderr, "Read %d/4640\n", bytes); return 0; }
During the last days I engaged in the task of porting a tool written mainly in C and Perl from the usual gnu autoconf/automake/libtool tool chain to a Perl installation approach, in this case, based on ExtUtils::MakeMaker. This decision can be discussed on a post later, but for now it is not the center of my... erm... complain.
After releasing some beta packages, and receiving Perl Testers results, I noticed that it was working perfectly on Linux and Mac systems (or at least, it appears to be working perfectly), and that some testers with windows could compile part of the package. These testers were using both Cygwin and Strawberry Perl.
For those who do not know, Cygwin is a complete set of gnu tools for the Microsoft Windows operating system. It includes everything, from a C compiler to KDE and Gnome. Strawberry Perl, in the other hand, is something recent. It is mostly a Perl interpreter and a MinGW C compiler (and a set of libraries). This makes it quite lighter than Cygwin. Thus, I installed it on my VMware windows image, and tried to port my module to Windows.
My first fight was against ExtUtils::CBuilder. The fact that it is a small Perl module, and well written, it was quite easy to make it work for my intents. Second fight was the compilation of the C part for Windows. I do not understand why (nor nobody tried to explain, although a lot of people in the Internet complains about it), but MinGW C library does not include the 'sleep', 'mkstemp' or 'link' functions. It was a fight to find replacements for each of them. Quickly, for future reference, sleep was replaced by Sleep from windows.h, mkstemp was replaced by mktemp, and link was ignored. This took about one day of work.
Finally, it compiled, without warnings. That was grateful. What wasn't grateful was the fact that the program does not work. And I can't understand why.
In a few words, an explanation. With luck, probably someone that knows what is going on reads this blog (I am sure there is some hidden reader somewhere) and can explain me. First, I open a file, with the 'open' function. This one returns a file description that is an integer. Checked the value, and it was an integer. Value 3. It is weird that I run the command more than once and the value is always 3. But probably Windows numbers file descriptors in a per-binary basis. Given that 'open' was working, I checked the next function: a 'read' of 6460 bytes (the size of a struct). Although the file being read is more than 1.9Mbytes, the read function returns the value 124. This means that 'read' was able to read just 124 bytes. Why?? Why in Hell???
After releasing some beta packages, and receiving Perl Testers results, I noticed that it was working perfectly on Linux and Mac systems (or at least, it appears to be working perfectly), and that some testers with windows could compile part of the package. These testers were using both Cygwin and Strawberry Perl.
For those who do not know, Cygwin is a complete set of gnu tools for the Microsoft Windows operating system. It includes everything, from a C compiler to KDE and Gnome. Strawberry Perl, in the other hand, is something recent. It is mostly a Perl interpreter and a MinGW C compiler (and a set of libraries). This makes it quite lighter than Cygwin. Thus, I installed it on my VMware windows image, and tried to port my module to Windows.
My first fight was against ExtUtils::CBuilder. The fact that it is a small Perl module, and well written, it was quite easy to make it work for my intents. Second fight was the compilation of the C part for Windows. I do not understand why (nor nobody tried to explain, although a lot of people in the Internet complains about it), but MinGW C library does not include the 'sleep', 'mkstemp' or 'link' functions. It was a fight to find replacements for each of them. Quickly, for future reference, sleep was replaced by Sleep from windows.h, mkstemp was replaced by mktemp, and link was ignored. This took about one day of work.
Finally, it compiled, without warnings. That was grateful. What wasn't grateful was the fact that the program does not work. And I can't understand why.
In a few words, an explanation. With luck, probably someone that knows what is going on reads this blog (I am sure there is some hidden reader somewhere) and can explain me. First, I open a file, with the 'open' function. This one returns a file description that is an integer. Checked the value, and it was an integer. Value 3. It is weird that I run the command more than once and the value is always 3. But probably Windows numbers file descriptors in a per-binary basis. Given that 'open' was working, I checked the next function: a 'read' of 6460 bytes (the size of a struct). Although the file being read is more than 1.9Mbytes, the read function returns the value 124. This means that 'read' was able to read just 124 bytes. Why?? Why in Hell???
I know that I have no choice. I will click OK, or click OK. But these messages make me crazy. If there is nothing to say, what for this window?

This is the keyboard inside KITT. Do you know it?
Yeah, I know, I am too much a geek.
Yeah, I know, I am too much a geek.
If you want to know how Portuguese e-commerce is going on in Portugal, just check flores.pt. I think I will not buy there again. In case they correct the bug meanwhile, check the screenshot below.