MIDict Tutorial¶
MIDict is an ordered “dictionary” with multiple indices where any index can serve as “keys” or “values”, capable of assessing multiple values via its powerful indexing syntax, and suitable as a bidirectional/inverse dict (a drop-in replacement for dict/OrderedDict in Python 2 & 3).
Features:
- Multiple indices
- Multi-value indexing syntax
- Convenient indexing shortcuts
- Bidirectional/inverse dict
- Compatible with normal dict in Python 2 & 3
- Accessing keys via attributes
- Extended methods for multi-indices
- Additional APIs to handle indices
- Duplicate keys/values handling
Multiple indices¶
Consider a table-like data set (e.g., a user table):
name | uid | ip |
---|---|---|
jack | 1 | 192.1 |
tony | 2 | 192.2 |
In each index (i.e., column), elements are unique and hashable (suitable for dict keys). Here, a “super dict” is wanted to represent this table which allows any index (column) to be used as the “keys” to index the table. Such a “super dict” is called a multi-index dictionary (MIDict).
A multi-index dictionary user
can be constructed with two arguments:
a list of items (rows of data), and a list of index names:
user = MIDict([['jack', 1, '192.1'],
['tony', 2, '192.2']],
['name', 'uid', 'ip'])
Index names are for easy human understanding and indexing, and thus
must be a string. The index names and items are ordered
in the dictionary. Compatible with a normal dict
, the first index (column)
is the primary index to lookup/index a key, while the rest index or indices
contain the corresponding key’s value or list of values:
user['jack'] -> [1, '192.1']
To use any index (column) as the “keys”, and other one or more
indices as the “values”, just specify the indices via the advanced
“multi-indexing” syntax d[index_key:key, index_value]
, e.g.:
user['name':'jack', 'uid'] -> 1
user['ip':'192.1', 'name'] -> 'jack'
Here, index_key
is the single column used as the “keys”, and key
is
an element in index_key
to locate the row of record (e.g.,
['jack', 1, '192.1']
) in the table. index_value
can be one or more columns
to specify the value(s) from the row of record.
Multi-value indexing syntax¶
For a multi-column data set, it’s useful to be able to access multiple values/columns at the same time.
In the indexing syntax d[index_key:key, index_value]
, both index_key
and index_value
can be a normal index name
or an int
(the order the index), and index_value
can also be a
tuple
, list
or slice
object to specify multiple values/columns,
with the following meanings:
type | meaning | corresponding values |
---|---|---|
int | the index of a key in d.keys() | the value of the key |
tuple/list | multiple keys or indices of keys | list of values |
slice | a range of keys (key_start:key_stop:step) | list of values |
The elements in the tuple/list or key_start
/key_stop
in the slice
syntax can be a normal index name or an int
.
See midict.IndexDict
for more details.
Using the above user
example:
user['name':'jack', ['uid', 'ip']] -> [1, '192.1']
<==> user['name':'jack', [1, 2]]
<==> user['name':'jack', 'uid':]
<==> user[0:'jack', 1:]
Convenient indexing shortcuts¶
Full syntax: d[index_key:key, index_value]
Short syntax:
d[key] <==> d[first_index:key, all_indice_except_first_index]
d[:key] <==> d[None:key] <==> d[last_index:key, all_indice_except_last_index]
d[key, index_value] <==> d[first_index:key, index_value] # only when ``index_value`` is a list or slice object
d[index_key:key, index_value_1, index_value_2, ...] <==> d[index_key:key, (index_value_1, index_value_2, ...)]
Examples:
user['jack'] -> [1, '192.1']
user[:'192.1'] -> ['jack', 1]
user['jack', :] -> ['jack', 1, '192.1']
user['jack', ['uid', 'ip']] -> [1, '192.1']
user[0:'jack', 'uid', 'ip'] -> [1, '192.1']
Bidirectional/inverse dict¶
With the advanced “multi-indexing” syntax, a MIDict with 2 indices can be used as a normal dict, as well as a convenient bidirectional dict to index using either a key or a value:
mi_dict = MIDict(jack=1, tony=2)
Forward indexing like a normal dict (
d[key] -> value
):mi_dict['jack'] -> 1 <==> mi_dict[0:'jack', 1]
Backward/inverse indexing using the slice syntax (
d[:value] -> key
):mi_dict[:1] -> 'jack' <==> mi_dict[-1:1, 0]
Compatible with normal dict in Python 2 & 3¶
A MIDict
with 2 indices is fully compatible with the normal dict
or OrderedDict, and can be used as a drop-in replacement of the latter:
normal_dict = dict(jack=1, tony=2)
mi_dict = MIDict(jack=1, tony=2)
The following equality checks all return True
:
mi_dict == normal_dict
normal_dict['jack'] == mi_dict['jack'] == 1
normal_dict.keys() == mi_dict.keys() == ['tony', 'jack']
normal_dict.values() == mi_dict.values() == [2, 1]
Conversion between MIDict
and dict
is supported in both directions:
mi_dict == MIDict(normal_dict) # True
normal_dict == dict(mi_dict) # True
normal_dict == mi_dict.todict() # True
The MIDict
API also matches the dict
API in Python 2 & 3. For example,
in Python 2, MIDict
has methods keys()
, values()
and items()
that return lists. In Python 3, those methods return dictionary views, just like dict
.
Accessing keys via attributes¶
Use the attribute syntax to access a key in MIDict if it is a valid
Python identifier (d.key <==> d['key']
):
mi_dict.jack <==> mi_dict['jack']
This feature is supported by midict.AttrDict
.
Note that it treats an attribute as a dictionary key only when it can not find a normal attribute with that name. Thus, it is the programmer’s responsibility to choose the correct syntax while writing the code.
Extended methods for multi-indices¶
A series of methods are extended to accept an optional agrument to specify
which index/indices to use, including keys()
, values()
, items()
,
iterkeys()
, itervalues()
, iteritems()
, viewkeys()
, viewvalues()
,
viewitems()
, __iter__()
and __reversed__()
:
user = MIDict([['jack', 1, '192.1'],
['tony', 2, '192.2']],
['name', 'uid', 'ip'])
user.keys() <==> user.keys(0) <==> user.keys('name') -> ['jack', 'tony']
user.keys('uid') <==> user.keys(1) -> [1, 2]
user.values() <==> user.values(['uid', 'ip']) -> [[1, '192.1'], [2, '192.2']]
user.values('uid') -> [1, 2]
user.values(['name','ip']) -> [['jack', '192.1'], ['tony', '192.2']]
user.items() <==> user.values(['name', 'uid', 'ip'])
-> [['jack', 1, '192.1'], ['tony', 2, '192.2']]
user.items(['name','ip']) -> [['jack', '192.1'], ['tony', '192.2']]
MIDict also provides two handy methods d.viewdict(index_key, index_value)
and d.todict(dict_type, index_key, index_value)
to view it as a normal
dict or convert it to a specific type of dict using specified indices as
keys and values.
Additional APIs to handle indices¶
MIDict provides special methods (d.reorder_indices()
, d.rename_index()
,
d.add_index()
, d.remove_index()
) to handle the indices:
d = MIDict([['jack', 1], ['tony', 2]], ['name', 'uid'])
d.reorder_indices(['uid', 'name'])
d -> MIDict([[1, 'jack'], [2, 'tony']], ['uid', 'name'])
d.reorder_indices(['name', 'uid']) # change back indices
d.rename_index('uid', 'userid') # rename one index
<==> d.rename_index(['name', 'userid']) # rename all indices
d -> MIDict([['jack', 1], ['tony', 2]], ['name', 'userid'])
d.add_index(values=['192.1', '192.2'], name='ip')
d -> MIDict([['jack', 1, '192.1'], ['tony', 2, '192.2']],
['name', 'userid', 'ip'])
d.remove_index('userid')
d -> MIDict([['jack', '192.1'], ['tony', '192.2']], ['name', 'ip'])
d.remove_index(['name', 'ip']) # remove multiple indices
d -> MIDict() # empty
Duplicate keys/values handling¶
The elements in each index of MIDict should be unique.
When setting an item using syntax d[index_key:key, index_value] = value2
,
if key
already exists in index_key
, the item of key
will be updated
according to index_value
and value2
(similar to updating the value of a key in
a normal dict
). However, if any value of value2
already exists in index_value
, a ValueExistsError
will be raised.
When constructing a MIDict or updating it with d.update()
,
duplicate keys/values are handled in the same way as above with
the first index treated as index_key
and the rest indices treated as index_value
:
d = MIDict(jack=1, tony=2)
d['jack'] = 10 # replace value of key 'jack'
d['tom'] = 3 # add new key/value
d['jack'] = 2 # raise ValueExistsError
d['alice'] = 2 # raise ValueExistsError
d[:2] = 'jack' # raise ValueExistsError
d['jack', :] = ['tony', 22] # raise ValueExistsError
d['jack', :] = ['jack2', 11] # replace key 'jack' to a new key 'jack2' and value to 11
d.update([['alice', 2]]) # raise ValueExistsError
d.update(alice=2) # raise ValueExistsError
d.update(alice=4) # add new key/value
MIDict([['jack',1]], jack=2) # {'jack': 2}
MIDict([['jack',1], ['jack',2]]) # {'jack': 2}
MIDict([['jack',1], ['tony',1]]) # raise ValueExistsError
MIDict([['jack',1]], tony=1) # raise ValueExistsError
Internal data struture¶
Essentially MIDict
is a Mapping
type, and it stores the data in the form of
{key: value}
for 2 indices (identical to a normal dict
) or
{key: list_of_values}
for more than 2 indices.
Additionally, MIDict uses a special attribute d.indices
to store
the indices, which is an IdxOrdDict
instance with the index names as keys
(the value of the first index is the MIDict
instance itself, and the value of
each other index is an AttrOrdDict
instance which maps each element in that index
to its corresponding element in the first index):
d = MIDict([['jack', 1], ['tony', 2]], ['name', 'uid'])
d.indices ->
IdxOrdDict([
('name', MIDict([('jack', 1), ('tony', 2)], ['name', 'uid'])),
('uid', AttrOrdDict([(1, 'jack'), (2, 'tony')])),
])
Thus, d.indices
also presents an interface to access the indices and items.
For example, access index names:
'name' in d.indices -> True
list(d.indices) -> ['name', 'uid']
d.indices.keys() -> ['name', 'uid']
Access items in an index:
'jack' in d.indices['name'] -> True
1 in d.indices['uid'] -> True
list(d.indices['name']) -> ['jack', 'tony']
list(d.indices['uid']) -> [1, 2]
d.indices['name'].keys() -> ['jack', 'tony']
d.indices['uid'].keys() -> [1, 2]
d.indices
also supports the attribute syntax:
d.indices.name -> MIDict([('jack', 1), ('tony', 2)], ['name', 'uid'])
d.indices.uid -> AttrOrdDict([(1, 'jack'), (2, 'tony')])
However, the keys/values in d.indices
should not be directly changed,
otherwise the structure or the references may be broken.
Use the methods of d
rather than d.indices
to operate the data.
More examples of advanced indexing¶
Example of two indices (compatible with normal dict):
color = MIDict([['red', '#FF0000'], ['green', '#00FF00']], ['name', 'hex']) # flexible indexing of short and long versions: color.red # -> '#FF0000' <==> color['red'] <==> color['name':'red'] <==> color[0:'red'] <==> color[-2:'red'] <==> color['name':'red', 'hex'] <==> color[0:'red', 'hex'] <==> color[-2:'red', 1] color[:'#FF0000'] # -> 'red' <==> color['hex':'#FF0000'] <==> color[1:'#FF0000'] <==> color[-1:'#FF0000'] <==> color['hex':'#FF0000', 'name'] <==> color[1:'#FF0000', 0] # setting an item using different indices/keys: color.blue = '#0000FF' <==> color['blue'] = '#0000FF' <==> color['name':'blue'] = '#0000FF' <==> color['name':'blue', 'hex'] = '#0000FF' <==> color[0:'blue', 1] = '#0000FF' <==> color[:'#0000FF'] = 'blue' <==> color[-1:'#0000FF'] = 'blue' <==> color['hex':'#0000FF'] = 'blue' <==> color['hex':'#0000FF', 'name'] = 'blue' <==> color[1:'#0000FF', 0] = 'blue' # result: # color -> MIDict([['red', '#FF0000'], ['green', '#00FF00'], ['blue', '#0000FF']], ['name', 'hex'])
Example of three indices:
user = MIDict([[1, 'jack', '192.1'], [2, 'tony', '192.2']], ['uid', 'name', 'ip']) user[1] -> ['jack', '192.1'] user['name':'jack'] -> [1, '192.1'] user['uid':1, 'ip'] -> '192.1' user[1, ['name','ip']] -> ['jack', '192.1'] user[1, ['name',-1]] -> ['jack', '192.1'] user[1, [1,1,0,0,2,2]] -> ['jack', 'jack', 1, 1, '192.1', '192.1'] user[1, :] -> [1, 'jack', '192.1'] user[1, ::2] -> [1, '192.1'] user[1, 'name':] -> ['jack', '192.1'] user[1, 0:-1] -> [1, 'jack'] user[1, 'name':-1] -> ['jack'] user['uid':1, 'name','ip'] -> ['jack', '192.1'] user[0:3, ['name','ip']] = ['tom', '192.3'] # set a new item explictly <==> user[0:3] = ['tom', '192.3'] # set a new item implicitly # result: # user -> MIDict([[1, 'jack', '192.1'], [2, 'tony', '192.2'], [3, 'tom', '192.3']], ['uid', 'name', 'ip'])
More classes and functions¶
Check midict package API for more classes and functions,
such as midict.FrozenMIDict
, midict.AttrDict
, midict.IndexDict
,
midict.MIDictView
, etc.