Behdad Esfahbod's daily notes on GNOME, Pango, Fedora, Persian Computing, Bob Dylan, and Dan Bern!

My Photo
Name:
Location: Toronto, Ontario, Canada

Ask Google.

Contact info
Google
Hacker Emblem Become a Friend of GNOME I Power Blogger
follow me on Twitter
Archives
July 2003
August 2003
October 2003
November 2003
December 2003
March 2004
April 2004
May 2004
July 2004
August 2004
September 2004
November 2004
March 2005
April 2005
May 2005
June 2005
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
February 2006
March 2006
April 2006
May 2006
June 2006
July 2006
August 2006
September 2006
October 2006
November 2006
December 2006
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007
August 2007
September 2007
October 2007
November 2007
December 2007
January 2008
February 2008
March 2008
April 2008
May 2008
June 2008
July 2008
August 2008
October 2008
November 2008
December 2008
January 2009
March 2009
April 2009
May 2009
June 2009
July 2009
August 2009
November 2009
December 2009
March 2010
April 2010
May 2010
June 2010
July 2010
October 2010
November 2010
April 2011
May 2011
August 2011
September 2011
October 2011
November 2011
November 2012
June 2013
January 2014
May 2015
Current Posts
McEs, A Hacker Life
Sunday, August 28, 2005
 I HATE prelink

While making last-minute clean ups for preload to handle deleted binaries and shared objects nicely, I figure out a disaster, which is prelink changing binaries every #^@&ing night :-(.

What happens then is that preload simply loses all the context information it has gathered about the binary, since it's a new binary there in place...

So I have to ignore the part of the file name after the magic string ".#prelink#.". What a hack...

Thursday, August 25, 2005
 Google Talk

Like everybody else is putting up their Google Talk id, mine is behdad.esfahbod. FFtAM with your common sense :-).

 Last days

The deadline for the Google Summer of Code is approaching rather fast, and I find myself hacking in the office for 48 hours, sleeping for 12, repeat. The code is slowly getting into CVS, check preload.

Lots of good news from the Gtk+ team, and lots of releases too. It's good to see that all the "Gtk+-2.8 is slow/immature" hype vapored away and we are offered a rock-slid toolkit once again. And of course there's Project Ridley too. Thanks Gtk+ team, thanks Matthias Clasen, Thanks Owen Taylor, Thanks Keith Packard and Thank Carl Worth.

On another note, weirdly enough, in the past 48 hours I've received offers for interview from Microsoft and Amazon. As I get more near to finishing my Masters, I'm starting to worry about ending up in Seattle, which is unfortunate given my interest in Linux and Open Source, but there's not much more I can do about that. I prefer the Boston conspiracy. Here is my résumé in case you are interested.

Friday, August 19, 2005
 It's all Love and Nothing More

Less than five hours after my call for Love(!) (for this bug) somebody sends them all over to me via email (GNOME has been experiencing hardware problems). I go to sleep after a day or two and wake to find out somebody else has uploaded another full set of icons to the bug! Golden Hearts go to Nigel Tao and Francesco Sepic for their dedication, a Bronze Heart to Chris Scobell, and Honorable Mention for Roger Svensson. (Now we are waiting for a good sould to hack up a patch :).)

jdub: Yeah, and Fedora is the iest of Linux distros, so a red one. :P

(Such a gucharmap'ic post)

 Pigeons

What do you call a p.g.o member? A 'p.g.o'n:
$ grep ^'p.g.o'n$ /usr/share/dict/words
pigeon
(see also PigeonRank)

 GNOME Love Wanted

Here is a pretty educational way to start contributing to GNOME as well as getting a tour of Unicode, all needed is a recent GNOME with Evince installed and some free time:

Bug 313496: Add iconic script indicators to gucharmap

Yes!

Wednesday, August 17, 2005
 Arabic Joining Rendering Problem

I'm been thinking about this weird problem that FarsiWeb fonts look bad at small sizes. The effect was like some glyphs being placed one pixel above where they should be and so an unwanted shift when they join to the neighboring glyphs. I discovered what's going on when accidentally rendered a sample text in huge font with inverse colors. A reworked sample is the image exhibited in this post. See it at full size here. If you don't see the problem, try enlarging the photo.

The problem is two-fold, and by far one of the hardest problems I have faced so far in Persian computing. In short: A) The glyphs in the font have tails spanning out of the glyph extents. B) When rendered with anti-aliasing on, the overlapped section becomes darker, due to being rendered twice.

I need to elaborate on why glyphs have those long tails in the first place. Since anti-aliased rendering is a relatively new phenomenon and many older renderers have had rounding bugs, etc, Arabic font designer have mostly opted for have glyphs that overlap, to make sure no white gap shows up at the joint. I have witnessed such a problem myself with DVI/PS/PDF viewers quite a few times while working on FarsiTeX. The solution again, have been to mechanically add some extra tail to make sure no gap is rendered. But now with anti-aliased rendering, it's causing problem.

The overlapping glyphs is what Keith Packard calls a broken font. Seems like Xft already takes extra care of joining glyphs, pre-adding the joints such that they don't look weird. So all we need is fonts that are not broken, which is, well, reasonable. Seeking for another fix may fail miserably, since otherwise you need to either 1) use conjoint operators to add the glyphs to a temporary surface and compose it to destination afterward, but keithp says even that doesn't quite work, or 2) merge all the glyph paths and draw them in one operation. This one sucks performance-wise, sine each glyph will be drawn from path every time, instead of the usual fast path of getting FreeType render glyph alpha masks that are cached on the server by Xft...

But there's one more reason to have broken fonts: We use that in our justification algorithm in Persian TeX systems. To implement proper Arabic justification in TeX, there are two ways: 1) use rules (black boxes), this is the easiest, but suffers from the same broken-viewers problem: the rules will be rendered with solid black edges, while neighboring edge from the glyphs is antialiased. 2) use cleaders to repeat a narrow joining glyph. This eats a lot of resources in the final output and rendered, but at least looks better. But since cleaders insert an integer number of those glyphs, we need to somehow fill the remainder, and we do that by having glyphs with joining tails extended out of their box. Is there another way to do justification without broken fonts? Maybe. With a PostScript backend for example, one may be able to write PS hacks to convert an stretchable rule or glue into an scaled joining glyph. Or we may be able to add support for that to pdfTeX. They have recently added a bunch of types, like floating point registers, etc. They are quite open to useful extensions. It may even be solved by PangoTeX, by the time it happens, Pango would definitely have support for Arabic justification :-).

There is a partial third solution: Use hinting to make sure the baseline of the font occupies a full number of aligned pixels, such that double-painting doesn't make any difference. But that seems to be harder to achieve than just cutting the tails.

Monday, August 15, 2005
 Misc

Back from a week in Hamilton, Ontario spending time hacking with friends. Happened to meet desrt of GNOME IRC over coffee too. I've received some very encouraging feedback for my PangoTeX post, none of them technical though.

An interesting not-so-much-flame war has been happening while I was recovering from accumulated jetlag problem over the past week. Hopefully we will have a much better panel, applets, and notification system at 10. To make Davyd happy, I've got a heck a lot of applets on my panel after the recent installation of FC4, listing process names:so it's not hard to imagine that I run out of panel space. I wish there was a way to hide the text in "Applications Places Desktop" menus that take a good twenty something percent of my top panel space.

Best of color perception. I'm still amazed by the first one. The second is reasonable to handle, but the third one made me wondering that I would have been amused if it was otherwise! I mean, your brain is filtering out the yellow/blue filter before deciding what the joint is colored in reality, but what you are comparing is the color after the filter applied! It's like having a photo of car A colored navy blue in a dark night, and a photo of car B colored deep dark blue in the daylight. Not surprisingly your brain has no problem identifying car A to be lighter colored than car B, but do GIMP and the two are the same color, not surprised! It's this processing that makes you not wonder why objects change color in different lighting conditions! Anyway, it's still surprising.

In Bob Dylan news, No Direction Home set will be released just in time for my b-day too, so you've got a lot of choice(!), as the set consists of a DVD by Martin Scorsese, the soundtrack, and the book. In case there's no indication of I getting any gift of those DVDs, I'll go on and watch it on PBS right on my birthday :-). Ha ha.

I'm deep busy with the Google Summer of Code project of mine. A report to come soon. Also decided to attend Google Code Jam 2005 and the online IOI 2005! I don't have any chance at Code Jam, since they don't allow C, only C++, Java, C#, and VB (yes, they require you to write classes with given interface), but I'm curious to see how my abilities have changed after five years: whether I still can do a tricky backtrack, a heuristic, and a graph problem in five hours of contest time with enough accuracy... I guess I've improved in the code quality and speed, but am not familiar with latest heuristic tricks that circulate in the IOI community.

Finally, after having to type the URLs for pango, dasher, cairo, gnome, fedora, etc too many times, hacked up a script to automatically link them :-). The little PHP script linkage uses a list of patterns and URLs to hyperize my blog posts. No reason to use PHP really, Python could do that as well. There are some things that I still feel better to use PHP for and heavy regexp is one of them. The links file on my laptop grows with every blog post. My RSI-suffering fingers appreciate it a lot. The heavily linked PangoTeX post was the motivation to code that thing.

Saturday, August 13, 2005
 Pango+TeX Follow-Up

A few readers asked me to elaborate on why I think a Pango-enabled TeX is useful, how does it work, and what to anticipate. In this post I'm calling such a combination PangoTeX. To understand why PangoTeX is useful, we need to know what each of them is good at, and what not.

TeX is pretty good at breaking paragraphs into lines and composing paragraphs into pages. What it is not good at is complex text layout, means, non-one-to-one character to glyph mapping and more than one text direction. e-TeX has primitives to set text direction, but not more.

Pango on the other hand, knows nothing about pages and columns. It does break paragraphs too, but not anything to envy. Patches exist that implement the TeX h&j algorithm for Pango, but that doesn't matter here. What Pango is pretty good at is the character to glyph mapping, where it implements the OpenType specification for quite a bunch of scripts. Pango contains modules for the following scripts: Arabic, Hebrew, Indic, Khmer, Tibetan, Thai, Syriac, and a Burmese module is recently proposed. Other than that, it has a module to use Uniscribe on Windows, and there's also a module available on internet (that may be integrated into Pango soon) to use the SIL Graphite engine.

So the plan is to make TeX pass streams of characters to Pango and ask it to shape them. The way XeTeX is implemented is that TeX passes to the higher-level rendering engine words of text and all it asks for is the width the word would occupy. XeTeX has backends for Apple ATSUI and ICU at this time. Pango has the advantage of abstracting OpenType in general, Uniscribe, Graphite, and hopefully ATSUI in the future. So it would be enough to only have a Pango backend.

That level of integration is pretty much what XeTeX does, which is quite useful on its own, but doesn't mean it should be the end of it. Much more can be done by using Pango's language/script detection features, it's bidirectional handling engine, etc, such that you don't have to mark left-to-right and right-to-left runs manually. Moreover, while doing this, we would introduce the Unicode Character Database to TeX, such that (for example) character category codes would be automatically set for the whole BMP range, and you may query other properties of characters should need be.

The way Omega approached the problem of Unicode+TeX was to add a push-down automaton layer that could convert the character stream at as many stages as desired. So you could have an input layer to convert from legacy character sets to Unicode, and then a complex shaping engine, and finally convert to font encoding. The problem with this approach is that it's very complex, so it introduced a zillion bugs. Of course, bugs can be fixed, but just then comes the next problem: Duplication. The powerful idea behind having shaping information in OpenType fonts was left unused there. For each font you had to implement the shaping logic (ligatures, etc) in an Omega Transformation Format file. Moreover, the whole machinery was more like Apple's AAT, rather than OpenType, which means it doesn't have any support for individual scripts: If you want to do Arabic shaping, you have to code all the joining logic in OTF. Neither did it provide Unicode character properties. If you need to know whether a character is a non-spacing mark, you have to list all NSMs in an OTF file. If you wanted to normalize the string, well, you had to code normalization in OTF, which is quite possible and interesting to code, but don't ask me about performance... Putting all these together, I believe that Omega cannot become a unified Unicode rendering engine without introducing support from outside libraries. When you do import some support from Pango and gNUicode for example, all in a sudden you do not need all that push-down automata anymore. A charset conversion input layer that uses iconv is desirable though.

About the output layer, XeTeX generates an extended DVI and converts it to PDF afterwards, using a backend-specific extended DVI driver. We can do that with Pango+Cairo too, to write to a PDF or PS backend. Or since Pango computes glyph-strings when analyzing the text, we may not even need Pango when converting the DVI to PDF. Anyway, what I'm more interested is to expand pdfTeX directly. We don't really need DVI these days. As I said before, the assumptions that Knuth made have proved to be wrong in the new millennium. It's not like the same DVI would render the same everywhere, no, you need the fonts. That's why fonts used in today's TeX systems is separate from fonts you use to render your desktop to you screen, because they are isolated and packaged separately in a TeX distribution, such that almost everyone has the same set of fonts... This should be changed too, with only having PDF output, and some kpathsea configuration. You still need the fonts to compile the TeX sources, but the output would be portable.

To conclude, I have changed my mind about cleaning up and adding UCD support to Omega and believe that we badly need a pdfTeX+Pango engine to go with our otherwise-rocking GNOME desktop.

That's all for now. I'm very much interested to get some feedback.

Tuesday, August 09, 2005
 WiFi Speed Spray

As advertised on eBay.

Monday, August 08, 2005
 Pass by Reference in C

When reading setjmp.h I figured out a nice trick used by glibc hackers to implement pass-by-reference'd data types. It was like Wow! Cool! Beautiful! Automatic memory allocation with no reference operator when passing to functions. This has been indeed part of the design of C and exactly why array types were added to C, yet I never figured out arrays can be used to implement opaque types that are passed by reference.
#include <stdio.h>

typedef struct __dummy_tag {
int x;
} the_type[1];


void
setter(the_type i, int v)
{
i->x = v;
}

int
getter(the_type i)
{
return i->x;
}


int
main(void)
{
the_type v;

setter (v, 10);
printf ("%d\n", getter (v));

setter (v, 20);
printf ("%d\n", getter (v));

return 0;
}

You learn new things everyday...

Update: Apparently blogger has !@#$ed up my post. Kindly read the code syntax-highlighted and cutely-indented in my blog.

Update2: Originally I had planet blamed instead of blogger in the previous update. Giving up after a few tries to do overstrike on planet, I simply replaced it with blogger.

Tuesday, August 02, 2005
 Back

So I did a little hacking to mirror my feed such that blogger does not get upset about planet fetching too regularly. In the meantime, I was at DDC and OLS and did a lot of hacking on dasher, cairo, and pango. As a result, I'm a cairo programmer (not developer) now. Finally moving on to preload. GIMPed hackergotchis for hub and jg and sent to jdub too. Man put 'em up.

I just found XeTeX yesterday. After reading the paper, I believe they are on a right track, as far as comparison to Omega is concerned. Several fundamental assumptions of Knuth's work and era have been proven wrong, and the the arhchivability through code-freeze is no exception. I think the comparison between a self-contained system like TeX/Omega and the XeTeX approach is exactly like C versus Java.

In other words, we can get the idea, plug Pango and Cairo in, and get a decent portable i18n'ized TeX-based typesetting system. No more font and complex-text-layout code duplication... WOW!

Interesting link stolen from jwz: WORLDPROCESSOR Catalog.