I need to parse uploaded vCard (vcf) file for one of the project, I am
working on. There is not that many libraries for Ruby around there.
Well, actually there is only one - vpim

Everything was working just fine with test files I found on the Internet,
until I hit the wall with vcf from Apple Address book. Few surprises:
accent characters or non US ASCII in general plus EMAIL and TEL
without a value

It took me a while to figure out the encoding for AAB files. It is
UTF16-LE. vpim gem does not handle it at all, so I have to write my own
convertor for the string from whatever I can guess to UTF8. I borrowed
some code from vpim and found a nice PHP code (vCard to CSV convertor)
and came up with this for UTF-8 enforcement

 def to_utf8(encoding=nil)
    begin
      unless encoding.blank?
        return Iconv.iconv('UTF-8',encoding,self).shift
      end
      case self
      when /^\xEF\xBB\xBF/
        #0xEF 0xBB 0xBF: UTF-8 with a BOM, the BOM is stripped
        Iconv.iconv('UTF-8','UTF-8',self.sub(/^\xEF\xBB\xBF/,'')).shift
      when /^\x00\x00\xFE\xFF/
        #'UTF-32BE'; //Big Endian
        Iconv.iconv('UTF-32BE','UTF-8',self).shift
      when /^\xFF\xFE\x00\x00/
        #'UTF-32LE'; //Little Endian
        Iconv.iconv('UTF-32LE','UTF-8',self).shift
      when /^\xFE\xFF/
        # - 0xFE 0xFF: UTF-16 with a BOM (big-endian), the BOM is stripped and string
        #   is converted to UTF-8  
arr = self.unpack('n*') arr.shift arr.pack('U*') when /^\xFF\xFE/ # - 0xFE 0xFF: UTF-16 with a BOM (big-endian), the BOM is stripped and string # is converted to UTF-8 arr = self.unpack('v*') arr.shift arr.pack('U*') when /^\x00\x62/i # - 0x00 'B' or 0x00 'b': UTF-16 (big-endian), the string is converted to UTF-8 self.unpack('n*').pack('U*') when /^\x62\x00/i # - 0xFE 0xFF: UTF-16 with a BOM (big-endian), the BOM is stripped and string # is converted to UTF-8 self.unpack('v*').pack('U*') when /(?x-mi:^(?:[\xC2-\xDF][\x80-\xBF]| \xE0[\xA0-\xBF][\x80-\xBF]| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}| \xED[\x80-\x9F][\x80-\xBF]| \xF0[\x90-\xBF][\x80-\xBF]{2}| [\xF1-\xF3][\x80-\xBF]{3}| \xF4[\x80-\x8F][\x80-\xBF]{2}|[\r\n\t -~])*$)/ #http://w3.org/International/questions/qa-forms-utf-8.html #mix of valid UTF8 and US-ASCII + new line, carriage return, tab and space return self.clone when /^\x00[^\x00]/ Iconv.iconv('UTF-8','UTF-16BE',self).shift when /^[^\x00]\x00/ Iconv.iconv('UTF-8','UTF-16LE',self).shift when /^\x00\x00\x00[^\x00]/ Iconv.iconv('UTF-8','UTF-32BE',self).shift when /^[^\x00]\x00\x00\x00/ Iconv.iconv('UTF-8','UTF-32LE',self).shift else raise ArgumentError, "Unknown encoding" end rescue Iconv::IllegalSequence, Iconv::InvalidEncoding => e raise ArgumentError, e end end # utf8

That helped me to solve half of the problems.

The other problem as I already mentioned - EMAIL and TEL without
value. vpim parser is very strict - no value - raise exception, and it is not
very friendly when you parse file with several cards. While I am not
ready to follow up with vpim developers I've put a monkey patch. thanks
to magical activesupport 'alias_method_chain'

  require_gem 'vpim', '= 0.360'
  module VpimEmptyValue #empty phone/email value monkey patch

    def self.included(base)
      base.extend(MonkeyPatchClassMethods)
      base.class_eval do
        class << self
          alias_method :decode_without_empty, :decode unless method_defined?(:decode_without_empty)
          alias_method :decode, :decode_with_empty
        end
      end
    end
      
    module MonkeyPatchClassMethods
      def decode_with_empty(field)
        if (value = field.to_text.strip).length < 1
          return new('')
        end
        decode_without_empty(field)
      end
      
    end
  end
  
  Vpim::Vcard::Telephone.send(:include, Uping::Contacts::VpimEmptyValue)
  Vpim::Vcard::Email.send(:include, Uping::Contacts::VpimEmptyValue)

The challenge was how to alias class methods in ruby, and it was easy,
once I found an example on the internet

2007/10/24. Regexp to verify UTF-8 has to be after BOM identifications.

2007/10/24. Vcard file may not have N line, it will parse without a problem, however when you try to access card.name, the exception is thrown.

  module VpimEmptyName
    def self.included(base)
      base.class_eval do
          def name_with_empty_value
            name_without_empty_value rescue Vpim::Vcard::Name.new
          end
          alias_method :name_without_empty_value, :name unless method_defined?(:name_without_empty_value)
          alias_method :name, :name_with_empty_value
      end
    end
  end

  Vpim::Vcard.send(:include, Uping::Contacts::VpimEmptyName)