# Grouping words and more

TAGS :
Viewed: 2 - Published at: a few seconds ago

#### [ Grouping words and more ]

I'm working on a project to learn Clojure in practice. I'm doing well, but sometimes I get stuck. This time I need to transform sequence of the form:

``````[":keyword0" "word0" "word1" ":keyword1" "word2" "word3"]
``````

into:

``````[[:keyword0 "word0" "word1"] [:keyword1 "word2" "word3"]]
``````

I'm trying for at least two hours, but I know not so many Clojure functions to compose something useful to solve the problem in functional manner.

I think that this transformation should include some partition, here is my attempt:

``````(partition-by (fn [x] (.startsWith x ":")) *1)
``````

But the result looks like this:

``````((":keyword0") ("word1" "word2") (":keyword1") ("word3" "word4"))
``````

Now I should group it again... I doubt that I'm doing right things here... Also, I need to convert strings (only those that begin with `:`) into keywords. I think this combination should work:

``````(keyword (subs ":keyword0" 1))
``````

How to write a function which performs the transformation in most idiomatic way?

``````(defn group-that [ arg ]
(if (not-empty arg)
(loop [list arg, acc [], result []]
(if (not-empty list)
(if (.startsWith (first list) ":")
(if (not-empty acc)
(recur (rest list) (vector (first list)) (conj result acc))
(recur (rest list) (vector (first list)) result))
(recur (rest list) (conj acc (first list)) result))
(conj result acc)
))))
``````

Just 1x iteration over the Seq and without any need of macros.

Here is a high performance version, using `reduce`

``````(reduce (fn [acc next]
(if (.startsWith next ":")
(conj acc [(-> next (subs 1) keyword)])
(conj (pop acc) (conj (peek acc)
next))))
[] data)
``````

Alternatively, you could extend your code like this

``````(->> data
(partition-by #(.startsWith % ":"))
(partition 2)
(map (fn [[[kw-str] strs]]
(cons (-> kw-str
(subs 1)
keyword)
strs))))
``````

Since the question is already here... This is my best effort:

``````(def data [":keyword0" "word0" "word1" ":keyword1" "word2" "word3"])

(->> data
(partition-by (fn [x] (.startsWith x ":")))
(partition 2)
(map (fn [[[k] w]] (apply conj [(keyword (subs k 1))] w))))
``````

I'm still looking for a better solution or criticism of this one.

First, let's construct a function that breaks vector `v` into sub-vectors, the breaks occurring everywhere property `pred` holds.

``````(defn breakv-by [pred v]
(let [break-points (filter identity (map-indexed (fn [n x] (when (pred x) n)) v))
starts (cons 0 break-points)
finishes (concat break-points [(count v)])]
(mapv (partial subvec v) starts finishes)))
``````

For our case, given

``````(def data [":keyword0" "word0" "word1" ":keyword1" "word2" "word3"])
``````

then

``````(breakv-by #(= (first %) \:) data)
``````

produces

``````[[] [":keyword0" "word0" "word1"] [":keyword1" "word2" "word3"]]
``````

Notice that the initial sub-vector is different:

• It has no element for which the predicate holds.
• It can be of length zero.

All the others

• start with their only element for which the predicate holds and
• are at least of length 1.

So `breakv-by` behaves properly with data that

For the purposes of the question, we need to muck about with what `breakv-by` produces somewhat:
``````(let [pieces (breakv-by #(= (first %) \:) data)]