Discussion:
[I18n-sig] python implementation of unicode collation algorithm
James Tauber
19 years ago
Permalink
I've made a start on a pure python implementation of the Unicode
Collation Algorithm (UTS #10) but I thought I'd best check with this
SIG whether such a thing already exists.

James
--
James Tauber http://jtauber.com/
journeyman of some http://jtauber.com/blog/
M.-A. Lemburg
19 years ago
Permalink
Post by James Tauber
I've made a start on a pure python implementation of the Unicode
Collation Algorithm (UTS #10) but I thought I'd best check with this
SIG whether such a thing already exists.
Not that I'm aware of.

Note that given the sizes of the collation tables, it's probably
better to have them defined in a C module, rather than a Python
data structure.
--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Jan 23 2006)
Post by James Tauber
Python/Zope Consulting and Support ... http://www.egenix.com/
mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
James Tauber
19 years ago
Permalink
Post by M.-A. Lemburg
Post by James Tauber
I've made a start on a pure python implementation of the Unicode
Collation Algorithm (UTS #10) but I thought I'd best check with this
SIG whether such a thing already exists.
Not that I'm aware of.
Note that given the sizes of the collation tables, it's probably
better to have them defined in a C module, rather than a Python
data structure.
Yes, this is certainly true of the DUCET, although for language-
specific collation element tables, it would be more manageable.

I'll probably start with a pure Python implementation and then take
it from there (or let someone with better C extension experience
optimize it)

James
Jim Fulton
19 years ago
Permalink
Post by James Tauber
I've made a start on a pure python implementation of the Unicode
Collation Algorithm (UTS #10) but I thought I'd best check with this
SIG whether such a thing already exists.
I'm not aware of any pure python implementations.

I've created a pyrex-based C wrapper of the ICU collation library at:

http://svn.zope.org/zope.ucol/trunk/

You don't need pyrex to use this and there is a distutils
setup script to install it.

I'd be happy to make an official release of this if anyone is
interested.

There is also a SWIG-based C++ wrapper of a much larger portion of the
ICU library, including collation at:

http://pyicu.osafoundation.org/

This requires swig, hand editing of makefiles, and dynamic-library
machinations, which is in large part why I ended up writing my own
wrapper.

Jim
--
Jim Fulton mailto:jim at zope.com Python Powered!
CTO (540) 361-1714 http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org
James Tauber
19 years ago
Permalink
Post by Jim Fulton
Post by James Tauber
I've made a start on a pure python implementation of the Unicode
Collation Algorithm (UTS #10) but I thought I'd best check with
this SIG whether such a thing already exists.
I'm not aware of any pure python implementations.
http://svn.zope.org/zope.ucol/trunk/
You don't need pyrex to use this and there is a distutils
setup script to install it.
I'd be happy to make an official release of this if anyone is
interested.
I'd like to see an official release, even if I do end up doing a pure
Python implementation myself.

James
--
James Tauber http://jtauber.com/
journeyman of some http://jtauber.com/blog/

Loading...