I need to parse uploaded vCard (vcf) file for one of the project,
I am
working on. There is not that many libraries for Ruby around there.
Well, actually there is only one - vpim
Everything was working just fine with test files I found on the Internet,
until I hit the wall with vcf from
Apple Address book. Few surprises:
accent characters or non US ASCII in general
plus EMAIL and TEL
without a value
It took me a while to figure out the encoding for AAB files. It is
UTF16-LE.
vpim gem does not handle it at all, so I have to write my own
convertor for the string from whatever I can guess to UTF8.
I borrowed
some code from vpim and found a nice PHP code (vCard to CSV convertor)
and came up with
this for UTF-8 enforcement
def to_utf8(encoding=nil)
begin
unless encoding.blank?
return Iconv.iconv('UTF-8',encoding,self).shift
end
case self
when /^\xEF\xBB\xBF/
#0xEF 0xBB 0xBF: UTF-8 with a BOM, the BOM is stripped
Iconv.iconv('UTF-8','UTF-8',self.sub(/^\xEF\xBB\xBF/,'')).shift
when /^\x00\x00\xFE\xFF/
#'UTF-32BE'; //Big Endian
Iconv.iconv('UTF-32BE','UTF-8',self).shift
when /^\xFF\xFE\x00\x00/
#'UTF-32LE'; //Little Endian
Iconv.iconv('UTF-32LE','UTF-8',self).shift
when /^\xFE\xFF/
# - 0xFE 0xFF: UTF-16 with a BOM (big-endian), the BOM is stripped and string
# is converted to UTF-8
arr = self.unpack('n*')
arr.shift
arr.pack('U*')
when /^\xFF\xFE/
# - 0xFE 0xFF: UTF-16 with a BOM (big-endian), the BOM is stripped and string
# is converted to UTF-8
arr = self.unpack('v*')
arr.shift
arr.pack('U*')
when /^\x00\x62/i
# - 0x00 'B' or 0x00 'b': UTF-16 (big-endian), the string is converted to UTF-8
self.unpack('n*').pack('U*')
when /^\x62\x00/i
# - 0xFE 0xFF: UTF-16 with a BOM (big-endian), the BOM is stripped and string
# is converted to UTF-8
self.unpack('v*').pack('U*')
when /(?x-mi:^(?:[\xC2-\xDF][\x80-\xBF]| \xE0[\xA0-\xBF][\x80-\xBF]| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}| \xED[\x80-\x9F][\x80-\xBF]| \xF0[\x90-\xBF][\x80-\xBF]{2}| [\xF1-\xF3][\x80-\xBF]{3}| \xF4[\x80-\x8F][\x80-\xBF]{2}|[\r\n\t -~])*$)/
#http://w3.org/International/questions/qa-forms-utf-8.html
#mix of valid UTF8 and US-ASCII + new line, carriage return, tab and space
return self.clone
when /^\x00[^\x00]/
Iconv.iconv('UTF-8','UTF-16BE',self).shift
when /^[^\x00]\x00/
Iconv.iconv('UTF-8','UTF-16LE',self).shift
when /^\x00\x00\x00[^\x00]/
Iconv.iconv('UTF-8','UTF-32BE',self).shift
when /^[^\x00]\x00\x00\x00/
Iconv.iconv('UTF-8','UTF-32LE',self).shift
else
raise ArgumentError, "Unknown encoding"
end
rescue Iconv::IllegalSequence, Iconv::InvalidEncoding => e
raise ArgumentError, e
end
end # utf8
That helped me to solve half of the problems.
The other problem as I already mentioned - EMAIL and TEL without
value. vpim parser is very strict - no value - raise exception,
and it is not
very friendly when you parse file with several cards. While I am not
ready to follow up with vpim developers
I've put a monkey patch. thanks
to magical activesupport 'alias_method_chain'
require_gem 'vpim', '= 0.360'
module VpimEmptyValue #empty phone/email value monkey patch
def self.included(base)
base.extend(MonkeyPatchClassMethods)
base.class_eval do
class << self
alias_method :decode_without_empty, :decode unless method_defined?(:decode_without_empty)
alias_method :decode, :decode_with_empty
end
end
end
module MonkeyPatchClassMethods
def decode_with_empty(field)
if (value = field.to_text.strip).length < 1
return new('')
end
decode_without_empty(field)
end
end
end
Vpim::Vcard::Telephone.send(:include, Uping::Contacts::VpimEmptyValue)
Vpim::Vcard::Email.send(:include, Uping::Contacts::VpimEmptyValue)
The challenge was how to alias class methods in ruby, and it was easy,
once I found an example on the internet
2007/10/24. Vcard file may not have N line, it will parse without a problem, however when you try to access card.name, the exception is thrown.
module VpimEmptyName
def self.included(base)
base.class_eval do
def name_with_empty_value
name_without_empty_value rescue Vpim::Vcard::Name.new
end
alias_method :name_without_empty_value, :name unless method_defined?(:name_without_empty_value)
alias_method :name, :name_with_empty_value
end
end
end
Vpim::Vcard.send(:include, Uping::Contacts::VpimEmptyName)