Satyr handles paths. There are some problems with paths and (sigh) encondings. Of those, here are two: there's no way to know in which encoding the filenames in a filesystem are enconded (f.i., there's no way to ask the filesystem), and even if that were possible, the filenames might not even be enconded in that enconding. In these (still!) transitioning times, lots and lots and shitloads of filesystems are used in UTF-8 environments, but some filenames are still in old ISO-8859-1 or whatever the system was using before.

Then comes QString. I'm taking a path from the command line; this path is the location of the (right now only) Collection for the player. I'm handling the command line using KCmdLineOptions, which returns QStrings. As we all know, QString, just like the unicode type in Python, handles all the data internally as Unicode, which is The Right Thing™. If you really need the internal data, say as bytes, you can always call the constData() method and be happy with it[1]. This would be the case for paths; you need the bytes.

Then comes PyQt4. For some reason, which maybe I will ask in the pyqt devel ML[2], constData() is not available. What to do? Well, that's what this post is about. What you're about to read is hacky as it can be, but then it works. I might feel dirty, but I can live with it. As long as I mark it as a utter/über hack and promise to revert it once that's possible...

# path is a QString
qba= QByteArray ()
qba.append (path)
path= str (qba)
# now path is a list of bytes/string.

Even if this part of the bug is fixed, then Phonon.MediaSource or fails when feeded that same path with this message:

ERROR: backend <span class="createlink">MediaObject</span> reached <span class="createlink">ErrorState</span> after  1 . It seems a <span class="createlink">KioMediaStream</span> will not help here, trying anyway.

or simply refusing to continue. Sure, in my case I should simply ignore the filename and inform the user what's going on, but sometimes you can't be so gentle.

satyr pykde python phonon

[1] blah.

[2] but then it doesn't make much sense now since Phil wants to get rid of QString (for several reasons, which might most possibly include this one).

Posted Wed 27 Jan 2010 11:55:55 PM CET Tags: phonon

More than two months ago I globed about QStrings and paths. The problem was this: my app accepts paths via command line, which are processed via KCmdLineOptions; which in turn converts everything to QStrings. What I wanted were paths, which are more like QByteArrays, not QStrings (because the latter have internally an unicode representation; more on that later). Including PyQt4 in the equation forced me to resort to QByteArray to get the path as a str instead of using QString.constData() (PyQt4 doesn't export that function). But that's only the beginning of the problem.

Take for instance this situation. I have a music collection that I've been building for years now (more that 10, I think). In the old times of this collection the filenames were encoded in iso-8859-1. Then the future came and converted all my machines to utf-8. But only the software; the filesystems were in one way or another inherited from system to system, from machine to machine. So I ended with a mixture of utf and iso filenames, to the point where I have a file whose filename is in iso, but the directory where it is is in utf. Yes, I know, it is a mess. But if I take any decent media player, I can play the file allright. That's because the filesystem knows nothing of encodings (otherwise it would reject badly encoded filenames).

I just spent last saturday making sure that satyr only stored filepaths in strs, not unicodes or QStrings. It took concentration, but having just a bunch of classes and only 3 or 4 points where the filepaths are managed it wasn't that difficult. Still, it took a day. But then, as I mentioned in that post, Phonon the is not able to play such files... or so I thought.

If you run satyr after executing export PHONON_XINE_DEBUG=1 you'll see a lot of Phonon debug info in the console (not that there is another way to run satyr right now anyways). Among all that info you'll see lines such as these two:

void Phonon::Xine::XineStream::setMrl(const QByteArray&, Phonon::Xine::XineStream::StateForNewMrl) ...
bool Phonon::Xine::XineStream::xineOpen(Phonon::State) xine_open succeeded for m_mrl = ...

If you're sharp enough (I'm not; sandsmark from #phonon had to tell me) you'll note the mention of MRL's. MRL's are xine's URL for media. As any URL, they can (and most of the time must) encode 'strange' characters with the so-called "percent encoding". This means that no matter what encodings the different parts of a filepath is in, I just add file:// at the beginning and then I can safely encode it scaping non-ascii characters to %xx representations... or that's what the theory says. One thing to note is that the file:// part must not be scaped; xine complains that the file does not exist in that case.

Looking for help in Qt's classes one can find QUrl and the already known QByteArray. I can call QByteArray.toPercentEnconding() from my str and feed that to QUrl.fromPercentEncoding() (which strangely returns a QString, which is exactly what we're avoiding) or QUrl.fromEncoded(). But then the first function encodes too much, replacing :// with %3A%2F%2F. No fun.

Ok, let's try creating a QByteArray with only the file:// and then append() the toPercentEncoding() of the path only. It works:

<span class="createlink">PyQt4</span>.QtCore.QByteArray('file://%2Fhome%2Fmdione...%2F%C3%9Altimo%20bondi%20a%20Finisterre%2F07-%20La%20peque%F1a%20novia%20del%20carioca.wav')

But then calling QUrl.fromEncoded() gives:

<span class="createlink">PyQt4</span>.QtCore.QUrl("file://xn--/home/mdione.../ltimo bondi a finisterre/07- la pequea novia del carioca-wkmz60758d.wav")

The URL got somehow puny-encoded, which of course xine doesn't recognize for local files.

Another option is to create an empty QUrl, call setEncodedUrl() with the ParsingMode to QUrl.StrictMode so we avoid 50 lines of code that start here[1] that try to escape everything all over again (and I already had some double-or-even-triple-enconding nightmares parsing RSS/Atom feeds last year, thank you), but we get puny-encoded again (maybe it is 'pwny-encoded'?).

Last resort: backtrack to the point were we created only one QByteArray with the path and call toPercentEncoding(); feed that to the method setEncodedPath() of an empty QUrl. Then we add the last piece calling setScheme('file') and we're ready! Of course we're not:

<span class="createlink">PyQt4</span>.QtCore.QByteArray('file:%2Fhome%2Fmdione...%2F%C3%9Altimo%20bondi%20a%20Finisterre%2F07-%20La%20peque%F1a%20novia%20del%20carioca.wav')

Notice the lack of the two // after file:? xine doesn't like it; hence, I don't either.

Ok, this post got too long. I hope I can resolve this soon, I already spent too much time on it. At least a good part of it was expaining it, so others don't have to suffer the same as I did.

BTW, satyr will shortly be released, whether I fix this bug or not.

satyr pykde phonon

[1] Look at the size of that file! 6k lines to handle URL's! Who would say it was so difficult... Once more I'm remembered of how lucky I am to have this libraries at the tips of my fingers, yay!

Posted Wed 27 Jan 2010 11:55:55 PM CET Tags: phonon

I certainly hope this is the last post in the Phonon-and-badly-encoded/mixed-encodings-filenames saga, but I know is just wishful thinking: as all encoding-related problems they never really dissapear, it's just that you hadn't hit the right wrong stone yet. In any case, I fixed all my later problems wherever they where, and now I can answer this question: how to play files whose filenames are badly encoded and/or have mixed encodings, all this in Phonon?

Right now the answer is: you have to provide a properly encoded QUrl. How, you might ask, can I get one of those? Are they selled in the same odly-looking places where you can buy cigarretes, or even marihuana[1]? The answer, luckly, is way more simple.

Putting together all the code I've been showing about Python, PyQt4/PyKDE4 and Phonon recently, it comes down to this[5]:

# path is a str()
qba= QByteArray (path)
# the exceptions are not needed,
# but is cleaner if you print the outcome of this
qu= QUrl.fromEncoded (qba.toPercentEncoding ("/ "))
# this is needed by the gstreamer backend[3],
# and the xine backend doesn't complain
qu.setScheme ('file')

... and that's it. You can now create a MediaSource with this qu.

There are a couple of ideas that I want to express as conclusion to all this:

  • In an ideal world these things should not happen. But this is one of the lesser problems with this non-ideal world, so bear with it.
  • Paths should not be stored in QStrings, even if they can (and they do) store this kind of pathnames, because if you try to 'encode' its contents (in the Unicode sense; that is, convert it to an encoding like UTF-8[4]) you get farts or barks at best. Yes, you always have constData() but from QString's class reference there is no warranty that this will keep being the case[6].
  • In fact, QString's class reference says at some point: «[one case] where QByteArray is appropriate are when you need to store raw binary data...», and [as I already wrote]( in-pyqt/), «[t]his would be the case for paths; you need the bytes».
  • QFile and QDir can only be created from QStrings. I'm not sure if, given all I wrote, that's right.

The good news is that satyr now can play any file that the backends can whatever their filename-as-string-of-bytes is, I'm a little bit happier about it, I got another contribution to KDE and might even have to close a lot of bugs!

satyr pykde python phonon

[1] That question is only legal in Nederlands[2] and very few others cities in the planet.

[2] Actually is not legal. See [this wikipedia article]( enforcement).

[3] I might pull up my sleeves again and fix that.

[4] You might have already know this, but if you not: you cannot print Unicode, because Unicode is not and encoding. You have to encode it first. Hence, the toLatin1(), toUtf8() and similar QString methods, and also the inverse from*().

[5] Of course the equivalent C++ code also works, with path being a char *.

[6] And in the case of PyQt4, that method is not even available. But [I already globed about it]( pyqt/).

Posted Wed 27 Jan 2010 11:55:55 PM CET Tags: phonon