Category: research

Map-Reduce, or why I hate software patents.

In the recent times you should be hearing a lot on map-reduce. I first heard of the term in last year Codebits. Although I wasn’t there, there was a talk with that title. I confess that knowing that map and reduce are common functional operators on different programming languages, I did not look to the talk abstract. During this year Yet Another Perl Workshop Europe, in Pisa, I saw a book on Hadoop, asked what it was about to a friend that wanted to buy it, and he said: a framework to implement Map-Reduce.
 
That made me think.. wait.. this should be the name of something different from what I though it was. Looking deeper I understood the concept. Googling, I found Google filled the patent request in 2004, and patented it in 2010. Found also that I used that construct in 2007, and documented it on my PhD thesis in 2008. Of course I did not call it Map-Reduce. In fact I did not call it anything fancy. It was just a way to get to results. Named it as my “divide and conquer approach”. And I did not heard of Google approach as well. I just got to it because I needed some results.
 
So, this is yet another reason why I hate software patents.

TEI – Well Done!

I will not detail anything about TEI. Sorry. I would just like to let you know that every time I need to work with any TEI subset, I find myself amazed with the quality of their documentation and the details they thought on before writing the standard.

Sometimes I just get to me thinking… do I really need all this stuff? The common answer is, no, I do not need so much detail on my annotations.

But that doesn’t mean I should not use TEI. Probably I should look to the section about the items I am trying to annotate and meditate. Probably I will not need the amount of different tags and details that are defined by TEI. But I am almost sure I will find one or two that I did not thought about. Then, I can use the portion of TEI I really want and forget about the rest. Probably my document will not validate against TEI, but probably it will not be too far away. And, probably, if someone else looks to the document, she will probably understand. And, if she don’t, I can always point to the TEI documentation and say: I am not using it all, but the subset I thought to be relevant.

Where am I using TEI? You can see it being used in the Dicionário-Aberto project, where the dictionary is encoded in a TEI subset. Also, I am looking to the TEI header and filtering it, making it an option to annotate documents on a parallel corpora project.

DBLP Bibliography Database and Scientific Publications in Portugal

In Portugal, Universities are rating researchers accordingly with their publications being or not cited on Internet articles databases like DBLP or ISI Web of Knowledge. Basically, if your article is not cited anywhere, then your article is class C. If it is cited in DBLP, it is class B. Finally, if it is present in ISI Web of Knowledge, it is classified as class A.

That is, if you can persuade DBLP author to publish the information about a conference or a journal, you can get your article to be rated B. Then, if a commercial company includes your article (that is, ISI Web of Knowledge), then you can get a class A article.

I wonder how a single guy (Michael Ley is doing a great job, that is not the problem) can find out if a journal is good or not for all areas. I do not know what Michael researches about, but I do not agree he can discern what conferences or journals are good for Parallel Computation, Natural Language Processing, Bio-Informatics, Artificial Intelligence, etc, etc.

Also, I wonder why there is a journal with a single issued published in DBLP, and without all articles listed. Yes, there is a journal that has more than thirty issues. Only one is in DBLP. And that one is not complete. Just half the articles are listed.

Yes, I tried a couple of times (in fact, more than four times) to send the full information about that journal and offered myself to add the BibTeX entry for all journal issues. Never ever got an answer.

The same happened when I sent (twice) the index for a journal on Natural Language Processing for the Iberian Languages. No answer at all. Is it because it is  bad journal? Probably. But I do not think my mails where read at all.

I can do similar comments about ISI Web of Knowledge. Why is a company maintaining this index? Why is this index paid? If a journal or conference pay for its inclusion, do you think the company will reply that it does not have enough quality to be listed?

More questions can be made. Check the number of conferences or journals on computer architecture. Then, check the number of conferences or journals in Natural Language Processing. Then, check the number of indexed conferences or journals in these areas. Yes, it is easier to be a GOOD researcher in computer architecture than in Natural Language Processing. Go figure why…