class FriendlyId::SlugString
This class provides some string-manipulation methods specific to slugs. Its Unicode support is provided by ActiveSupport::Multibyte::Chars; this is needed primarily for Unicode encoding normalization and proper calculation of string lengths.
Note that this class includes many “bang methods” such as {#clean!} and {#normalize!} that perform actions on the string in-place. Each of these methods has a corresponding “bangless” method (i.e., +SlugString#clean!+ and +SlugString#clean+) which does not appear in the documentation because it is generated dynamically.
All of the bang methods return an instance of String, while the bangless versions return an instance of FriendlyId::SlugString, so that calls to methods specific to this class can be chained:
string = SlugString.new("hello world") string.with_dashes! # => "hello-world" string.with_dashes # => <FriendlyId::SlugString:0x000001013e1590 @wrapped_string="hello-world">
@see www.utf8-chartable.de/unicode-utf8-table.pl?utf8=dec Unicode character table @see ::dump_approximations
Constants
- APPROXIMATIONS
All values are Unicode decimal characters or character arrays.
Public Class Methods
This method can be used by developers wishing to debug the {APPROXIMATIONS} hashes, which are written in a hard-to-read format. @return Hash @example
> ruby -rrubygems -rlib/friendly_id -e 'p FriendlyId::SlugString.dump_approximations'
{:common => {“À”=>“A”, “Á”=>“A”, “”=>“A”, “Ô=>“A”, “Ä”=>“A”, “Å”=>“A”, “Æ”=>“AE”, “Ç”=>“C”, “È”=>“E”, “É”=>“E”, “Ê”=>“E”, “Ë”=>“E”, “Ì”=>“I”, “Í”=>“I”, “Δ=>“I”, “Ï”=>“I”, “Д=>“D”, “Ñ”=>“N”, “Ò”=>“O”, “Ó”=>“O”, “Ô”=>“O”, “Õ”=>“O”, “Ö”=>“O”, “×”=>“x”, “Ø”=>“O”, “Ù”=>“U”, “Ú”=>“U”, “Û”=>“U”, “Ü”=>“U”, “Ý”=>“Y”, “Þ”=>“Th”, “ß”=>“ss”, “à”=>“a”, “á”=>“a”, “â”=>“a”, “ã”=>“a”, “ä”=>“a”, “å”=>“a”, “æ”=>“ae”, “ç”=>“c”, “è”=>“e”, “é”=>“e”, “ê”=>“e”, “ë”=>“e”, “ì”=>“i”, “í”=>“i”, “î”=>“i”, “ï”=>“i”, “ð”=>“d”, “ñ”=>“n”, “ò”=>“o”, “ó”=>“o”, “ô”=>“o”, “õ”=>“o”, “ö”=>“o”, “ø”=>“o”, “ù”=>“u”, “ú”=>“u”, “û”=>“u”, “ü”=>“u”, “ý”=>“y”, “þ”=>“th”, “ÿ”=>“y”, “Ā”=>“A”, “ā”=>“a”, “Ă”=>“A”, “ă”=>“a”, “Ą”=>“A”, “ą”=>“a”, “Ć”=>“C”, “ć”=>“c”, “Ĉ”=>“C”, “ĉ”=>“c”, “Ċ”=>“C”, “ċ”=>“c”, “Č”=>“C”, “č”=>“c”, “Ď”=>“D”, “ď”=>“d”, “Đ”=>“D”, “đ”=>“d”, “Ē”=>“E”, “ē”=>“e”, “Ĕ”=>“E”, “ĕ”=>“e”, “Ė”=>“E”, “ė”=>“e”, “Ę”=>“E”, “ę”=>“e”, “Ě”=>“E”, “ě”=>“e”, “Ĝ”=>“G”, “ĝ”=>“g”, “Ğ”=>“G”, “ğ”=>“g”, “Ġ”=>“G”, “ġ”=>“g”, “Ģ”=>“G”, “ģ”=>“g”, “Ĥ”=>“H”, “ĥ”=>“h”, “Ħ”=>“H”, “ħ”=>“h”, “Ĩ”=>“I”, “ĩ”=>“i”, “Ī”=>“I”, “ī”=>“i”, “Ĭ”=>“I”, “ĭ”=>“i”, “Į”=>“I”, “į”=>“i”, “İ”=>“I”, “ı”=>“i”, “IJ”=>“IJ”, “ij”=>“ij”, “Ĵ”=>“J”, “ĵ”=>“j”, “Ķ”=>“K”, “ķ”=>“k”, “ĸ”=>“k”, “Ĺ”=>“L”, “ĺ”=>“l”, “Ļ”=>“L”, “ļ”=>“l”, “Ľ”=>“L”, “ľ”=>“l”, “Ŀ”=>“L”, “ŀ”=>“l”, “Ł”=>“L”, “ł”=>“l”, “Ń”=>“N”, “ń”=>“n”, “Ņ”=>“N”, “ņ”=>“n”, “Ň”=>“N”, “ň”=>“n”, “ʼn”=>“'n”, “Ŋ”=>“NG”, “ŋ”=>“ng”, “Ō”=>“O”, “ō”=>“o”, “Ŏ”=>“O”, “ŏ”=>“o”, “Ő”=>“O”, “ő”=>“o”, “Œ”=>“OE”, “œ”=>“oe”, “Ŕ”=>“R”, “ŕ”=>“r”, “Ŗ”=>“R”, “ŗ”=>“r”, “Ř”=>“R”, “ř”=>“r”, “Ś”=>“S”, “ś”=>“s”, “Ŝ”=>“S”, “ŝ”=>“s”, “Ş”=>“S”, “ş”=>“s”, “Š”=>“S”, “š”=>“s”, “Ţ”=>“T”, “ţ”=>“t”, “Ť”=>“T”, “ť”=>“t”, “Ŧ”=>“T”, “ŧ”=>“t”, “Ũ”=>“U”, “ũ”=>“u”, “Ū”=>“U”, “ū”=>“u”, “Ŭ”=>“U”, “ŭ”=>“u”, “Ů”=>“U”, “ů”=>“u”, “Ű”=>“U”, “ű”=>“u”, “Ų”=>“U”, “ų”=>“u”, “Ŵ”=>“W”, “ŵ”=>“w”, “Ŷ”=>“Y”, “ŷ”=>“y”, “Ÿ”=>“Y”, “Ź”=>“Z”, “ź”=>“z”, “Ż”=>“Z”, “ż”=>“z”, “Ž”=>“Z”, “ž”=>“z”}, :german => {“ü”=>“ue”, “ö”=>“oe”, “ä”=>“ae”}, :spanish => {“Ñ”=>“Nn”, “ñ”=>“nn”}}
# File lib/friendly_id/slug_string.rb, line 102 def self.dump_approximations Hash[APPROXIMATIONS.map do |name, approx| [name, Hash[approx.map {|key, value| [[key].pack("U*"), [value].flatten.pack("U*")]}]] end] end
@param string [String] The string to use as the basis of the SlugString.
# File lib/friendly_id/slug_string.rb, line 110 def initialize(string) super string.to_s end
Public Instance Methods
Approximate an ASCII string. This works only for Western strings using characters that are Roman-alphabet characters + diacritics. Non-letter characters are left unmodified.
string = SlugString.new "Łódź, Poland" string.approximate_ascii # => "Lodz, Poland" string = SlugString.new "日本" string.approximate_ascii # => "日本"
You can pass any key(s) from {APPROXIMATIONS} as arguments. This allows for
contextual approximations. By default; :spanish
and
:german
are provided:
string = SlugString.new "Jürgen Müller" string.approximate_ascii # => "Jurgen Muller" string.approximate_ascii :german # => "Juergen Mueller" string = SlugString.new "¡Feliz año!" string.approximate_ascii # => "¡Feliz ano!" string.approximate_ascii :spanish # => "¡Feliz anno!"
You can modify the built-in approximations, or add your own:
# Make Spanish use "nh" rather than "nn" FriendlyId::SlugString::APPROXIMATIONS[:spanish] = { # Ñ => "Nh" 209 => [78, 104], # ñ => "nh" 241 => [110, 104] }
It's also possible to use a custom approximation for all strings:
FriendlyId::SlugString.approximations << :german
Notice that this method does not simply convert to ASCII; if you want to remove non-ASCII characters such as “¡” and “¿”, use {#to_ascii!}:
string.approximate_ascii!(:spanish) # => "¡Feliz anno!" string.to_ascii! # => "Feliz anno!"
@param *args <Symbol> @return String
# File lib/friendly_id/slug_string.rb, line 155 def approximate_ascii!(*args) @maps = (self.class.approximations + args + [:common]).flatten.uniq @wrapped_string = normalize_utf8(:c).unpack("U*").map { |char| approx_char(char) }.flatten.pack("U*") end
Removes leading and trailing spaces or dashses, and replaces multiple whitespace characters with a single space. @return String
# File lib/friendly_id/slug_string.rb, line 163 def clean! @wrapped_string = @wrapped_string.gsub(/\A\-|\-\z/, '').gsub(/\s+/u, ' ').strip end
Lowercases the string. Note that this works for Unicode strings, though your milage may vary with Greek and Turkic strings. @return String
# File lib/friendly_id/slug_string.rb, line 170 def downcase! @wrapped_string = apply_mapping :lowercase_mapping end
Normalize the string for use as a FriendlyId. Note that in this context,
normalize
means, strip, remove non-letters/numbers, downcasing
and converting whitespace to dashes.
ActiveSupport::Multibyte::Chars#normalize is aliased to
normalize_utf8
in this subclass. @return String
# File lib/friendly_id/slug_string.rb, line 217 def normalize! clean! word_chars! downcase! with_dashes! end
Normalize the string for a given {FriendlyId::Configuration}. @param config [FriendlyId::Configuration] @return String
# File lib/friendly_id/slug_string.rb, line 199 def normalize_for!(config) if config.normalizer? @wrapped_string = config.normalizer.call(to_s) else approximate_ascii! if config.approximate_ascii? to_ascii! if config.strip_non_ascii? normalize! end end
Delete any non-ascii characters. @return String
# File lib/friendly_id/slug_string.rb, line 232 def to_ascii! @wrapped_string = normalize_utf8(:c).unpack("U*").reject {|char| char > 127}.pack("U*") end
Truncate the string to max
length. @return String
# File lib/friendly_id/slug_string.rb, line 226 def truncate!(max) @wrapped_string = self[0...max].to_s if length > max end
Upper-cases the string. Note that this works for Unicode strings, though your milage may vary with Greek and Turkic strings. @return String
# File lib/friendly_id/slug_string.rb, line 239 def upcase! @wrapped_string = apply_mapping :uppercase_mapping end
Validate that the slug string is not blank or reserved, and truncate it to the max length if necessary. @param config [FriendlyId::Configuration] @return String @raise FriendlyId::BlankError @raise FriendlyId::ReservedError
# File lib/friendly_id/slug_string.rb, line 249 def validate_for!(config) truncate!(config.max_length) raise FriendlyId::BlankError if blank? raise FriendlyId::ReservedError if config.reserved?(self) self end
Replaces whitespace with dashes (“-”). @return String
# File lib/friendly_id/slug_string.rb, line 258 def with_dashes! @wrapped_string = @wrapped_string.gsub(/[\s\-]+/u, '-') end
Remove any non-word characters. @return String
# File lib/friendly_id/slug_string.rb, line 176 def word_chars! @wrapped_string = normalize_utf8(:c).unpack("U*").map { |char| case char # control chars when 0..31 # punctuation; 45 is "-" (HYPHEN-MINUS) and allowed when 33..44 # more puncuation when 46..47 # more puncuation and other symbols when 58..64 # brackets and other symbols when 91..96 # braces, pipe, tilde, etc. when 123..191 else char end }.compact.pack("U*") end