Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching #86

Merged
merged 1 commit into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions deps-lock.json
Original file line number Diff line number Diff line change
Expand Up @@ -1547,6 +1547,16 @@
"mvn-repo": "https://repo.maven.apache.org/maven2/",
"hash": "sha256-hML6t6Mso8HkDEGm7Mm9U26UezBYDne41dwjKjSSXqw="
},
{
"mvn-path": "org/clojure/core.memoize/1.0.257/core.memoize-1.0.257.jar",
"mvn-repo": "https://repo.maven.apache.org/maven2/",
"hash": "sha256-mg6RgW4hp3SY7+3r1HrUvcL7+X+dvEm8nZWU4gEkbpY="
},
{
"mvn-path": "org/clojure/core.memoize/1.0.257/core.memoize-1.0.257.pom",
"mvn-repo": "https://repo.maven.apache.org/maven2/",
"hash": "sha256-3QQaWFudj1eN30s82rhS8/XdKajjNl4d1ehftl4/c9w="
},
{
"mvn-path": "org/clojure/core.rrb-vector/0.0.11/core.rrb-vector-0.0.11.jar",
"mvn-repo": "https://repo.maven.apache.org/maven2/",
Expand Down
3 changes: 2 additions & 1 deletion deps.edn
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
ring-cors/ring-cors {:mvn/version "0.1.13"}
ring/ring-core {:mvn/version "1.9.5"}
ring/ring-jetty-adapter {:mvn/version "1.9.5"}
tech.tablesaw/tablesaw-core {:mvn/version "0.43.1"}}
tech.tablesaw/tablesaw-core {:mvn/version "0.43.1"}
org.clojure/core.memoize {:mvn/version "1.0.257"}}
:paths ["src" "resources"]
:aliases {:test {:extra-paths ["test"]
:extra-deps {com.gfredericks/test.chuck {:mvn/version "0.2.13"}
Expand Down
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,6 @@
"shadow-cljs": "^2.27.5"
},
"dependencies": {
"memoizee": "^0.4.15"
}
}
20 changes: 20 additions & 0 deletions src/inferenceql/query/cache.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
(ns inferenceql.query.cache
"For caching expensive results."
(:require #?(:clj [clojure.core.memoize :as memo]
:cljs ["memoizee" :as memoizee])
[clojure.string :as string]))

(def default-threshold 100)


(defn lru
"Memoizes a fn with a least-recently-used eviction policy.

After the number of cached results exceeds the threshold, the
least-recently-used ones will be evicted."
([f]
(lru f default-threshold))
([f lru-threshold]
#?(:clj (memo/lru f :lru/threshold lru-threshold)
:cljs (memoizee f #js {"max" lru-threshold
"normalizer" js/JSON.stringify}))))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we'll run into trouble using js/JSON.stringify with some of the parameters passed to these functions, like model (which can be a special type, eg. reify) or cljs data types which look pretty weird (though maybe that's ok, if it's more performant than using clj->js first..)

image

It might be worth doing some simple benchmarks before committing to an approach, as some of these conversions can be surprisingly costly.

Copy link
Contributor Author

@KingMob KingMob Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My predictions are that it should be fine, but it won't hurt to check. Will have to wait until I get back from vacation, though.

The serialized weirdness should be ok as long as they never result in accidentally identical string representations. If anything, I think the opposite is true. There are probably things that could share keys, but won't. Luckily, that just means some cache misses.

I would be very surprised if clj->js + JSON/stringify was faster than JSON/stringify on the original, but no need to guess! I'll try it out when I get back. We'll see what my posteriors are then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it turns out I am very surprised. Unfortunately, I don't think it will work out to use clj->js first.

I ran a quick-and-dirty test in cljs:

(let [n 100000
        m {"x" 0 :foo {:bar 1 :moop/floop "asdf"} :baz [1 2 3]}
        a (reify IAtom)]

    (js/console.log "JSON.stringify:")
    (time
      (dotimes [_ n]
        (js/JSON.stringify m)
        (js/JSON.stringify :foo)
        (js/JSON.stringify a)))

    (js/console.log "clj->js, then JSON.stringify:")
    (time
      (dotimes [_ n]
        (-> m clj->js (js/JSON.stringify))
        (-> :foo clj->js (js/JSON.stringify))
        (-> a clj->js (js/JSON.stringify))))

    (js/console.log "clj->js w/ str keyword-fn, then JSON.stringify:")
    (time
      (dotimes [_ n]
        (-> m (clj->js :keyword-fn str) (js/JSON.stringify))
        (-> :foo (clj->js :keyword-fn str) (js/JSON.stringify))
        (-> a (clj->js :keyword-fn str) (js/JSON.stringify))))

    (js/console.log "bean/->js, then JSON.stringify:")
    (time
      (dotimes [_ n]
        (-> m (bean/->js :key->prop str) (js/JSON.stringify))
        (-> :foo (bean/->js :key->prop str)(js/JSON.stringify))
        (-> a (bean/->js :key->prop str) (js/JSON.stringify)))))

and got:

JSON.stringify:
"Elapsed time: 1274.645467 msecs"
clj->js, then JSON.stringify:
"Elapsed time: 1208.113983 msecs"
clj->js w/ str keyword-fn, then JSON.stringify:
"Elapsed time: 1443.455173 msecs"
bean/->js, then JSON.stringify:
"Elapsed time: 1506.302295 msecs"

Using clj->js before JSON.stringify is slightly faster, but unfortunately, it can't distinguish between string and keyword keys. Having duplicate string and keyword keys would cause other problems, so it might be safe to do. But, I'd rather err on the side of correctness, especially since these timings aren't too far off from each other.

We can always revisit our caching strategy, if necessary.

81 changes: 46 additions & 35 deletions src/inferenceql/query/scalar.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
[inferenceql.inference.approximate :as approx]
[inferenceql.inference.gpm :as gpm]
;; [inferenceql.inference.search.crosscat :as crosscat]
[inferenceql.query.cache :as cache]
#?(:clj [inferenceql.query.generative-table :as generative-table])
[inferenceql.query.literal :as literal]
[inferenceql.query.parser.tree :as tree]
Expand Down Expand Up @@ -112,23 +113,29 @@
:env m}))
result)))

(defn prob
[model event]
(let [event (inference-event event)]
(math/exp (gpm/logprob model event))))

(defn pdf
[model event]
(let [event (update-keys event str)]
(math/exp (gpm/logpdf model event {}))))

(defn condition
[model conditions]
(let [conditions (-> (medley/filter-vals some? conditions)
(update-keys str))]
(cond-> model
(seq conditions)
(gpm/condition conditions))))
(def prob
(cache/lru
(fn prob*
[model event]
(let [event (inference-event event)]
(math/exp (gpm/logprob model event))))))

(def pdf
(cache/lru
(fn pdf*
[model event]
(let [event (update-keys event str)]
(math/exp (gpm/logpdf model event {}))))))

(def condition
(cache/lru
(fn condition*
[model conditions]
(let [conditions (-> (medley/filter-vals some? conditions)
(update-keys str))]
(cond-> model
(seq conditions)
(gpm/condition conditions))))))

(defn condition-all
[model bindings]
Expand Down Expand Up @@ -172,24 +179,28 @@
:else (remove nil? form)))))
event))

(defn constrain
[model event]
(let [event (-> event
(strip-nils)
(inference-event))]
(cond-> model
(some? event)
(gpm/constrain event
{:operation? operation?
:operands operands
:operator operator
:variable? variable?}))))

(defn mutual-info
[model event-a event-b]
(let [event-a (inference-event event-a)
event-b (inference-event event-b)]
(gpm/mutual-info model event-a event-b)))
(def constrain
(cache/lru
(fn constrain*
[model event]
(let [event (-> event
(strip-nils)
(inference-event))]
(cond-> model
(some? event)
(gpm/constrain event
{:operation? operation?
:operands operands
:operator operator
:variable? variable?}))))))

(def mutual-info
(cache/lru
(fn mutual-info*
[model event-a event-b]
(let [event-a (inference-event event-a)
event-b (inference-event event-b)]
(gpm/mutual-info model event-a event-b)))))

(defn approx-mutual-info
[model vars-lhs vars-rhs]
Expand Down
37 changes: 37 additions & 0 deletions test/inferenceql/query/cache_test.cljc
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
(ns inferenceql.query.cache-test
(:require [clojure.test :refer [deftest is testing]]
[inferenceql.query.cache :as cache]))

(deftest basic-caching
(let [cache-size 2
a (atom 0)
incrementer (fn [_ignored-but-cached-key]
(swap! a inc))
cached-incrementer (cache/lru incrementer cache-size)]

(is (= 1 (cached-incrementer :foo)))
(is (= 2 (cached-incrementer :bar)))

(is (= 1 (cached-incrementer :foo)))
(is (= 2 (cached-incrementer :bar)))

(is (= 3 (cached-incrementer :moop)))

;; cache cleared for :foo
(is (= 4 (cached-incrementer :foo)))))

(deftest disambiguate-between-0-and-nil
(let [cache-size 1000
englishize (fn [x]
(case x
0 "zero"
nil "nil"
"other"))
cached-englishize (cache/lru englishize cache-size)]
;; Add them both.
(is (= "zero" (cached-englishize 0)))
(is (= "nil" (cached-englishize nil)))

;; Check that they return the correct values.
(is (= "zero" (cached-englishize 0)))
(is (= "nil" (cached-englishize nil)))))
Loading